降维算法：主成分分析 VS 自动编码器( 二 ) _降维算法

文章插图

PCA 图像重构
计算重构后图像的均方根误差：

def my_rmse(np_arr1,np_arr2):dim = np_arr1.shapetot_loss = 0for i in range(dim[0]):for j in range(dim[1]):tot_loss += math.pow((np_arr1[i,j] - np_arr2[i,j]),2)return round(math.sqrt(tot_loss/(dim[0]* dim[1]*1.0)),2)error_pca = my_rmse(image_matrix,reconstructed_matrix)

计算可知，均方根误差为11.84 。
单层的以线性函数作为激活函数的自动编码器

# Standarise the DataX_org = image_matrix.copy()sc = StandardScaler()X = sc.fit_transform(X_org)# this is the size of our encoded representationsencoding_dim = reduced_pixel # this is our input placeholderinput_img = Input(shape=(img.width,))# "encoded" is the encoded representation of the inputencoded = Dense(encoding_dim, activation='linear')(input_img)# "decoded" is the lossy reconstruction of the inputdecoded = Dense(img.width, activation=None)(encoded)# this model maps an input to its reconstructionautoencoder = Model(input_img, decoded)#Encoderencoder = Model(input_img, encoded)# create a placeholder for an encoded (32-dimensional) inputencoded_input = Input(shape=(encoding_dim,))# retrieve the last layer of the autoencoder modeldecoder_layer = autoencoder.layers[-1]# create the decoder modeldecoder = Model(encoded_input, decoder_layer(encoded_input))autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')autoencoder.fit(X, X,epochs=500,batch_size=16,shuffle=True)encoded_imgs = encoder.predict(X)decoded_imgs = decoder.predict(encoded_imgs)

文章插图

自动编码器结构
【降维算法：主成分分析 VS 自动编码器】检查各维度的相关性：

df_ae = pd.DataFrame(data = https://www.isolves.com/it/cxkf/sf/2020-08-04/encoded_imgs,columns=list(range(encoded_imgs.shape[1])))figure = plt.figure(figsize=(10,6))corrMatrix = df_ae.corr()sns.heatmap(corrMatrix, annot=False)plt.show()

文章插图

自动编码器降维后各维度相关性
相关矩阵表明新的变换特征具有一定的相关性。皮尔逊相关系数与0有很大的偏差。
接下来，我们通过降维后的数据来重构原始数据：

X_decoded_ae = sc.inverse_transform(decoded_imgs)reconstructed_image_ae = Image.fromarray(np.uint8(X_decoded_ae))plt.figure(figsize=(8,12))plt.imshow(reconstructed_image_ae,cmap = plt.cm.gray)

文章插图

自动编码器重构后的图像
计算重构后图像的均方根误差：
error_ae = my_rmse(image_matrix,X_decoded_ae)计算可知，均方根误差为12.15 。单层线性激活的自动编码器和 PCA 性能几乎一致。
三层的以非线性函数为激活函数的自动编码器

input_img = Input(shape=(img.width,))encoded1 = Dense(128, activation='relu')(input_img)encoded2 = Dense(reduced_pixel, activation='relu')(encoded1)decoded1 = Dense(128, activation='relu')(encoded2)decoded2 = Dense(img.width, activation=None)(decoded1)autoencoder = Model(input_img, decoded2)autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')autoencoder.fit(X,X,epochs=500,batch_size=16,shuffle=True)# Encoderencoder = Model(input_img, encoded2)# Decoderdecoder = Model(input_img, decoded2)encoded_imgs = encoder.predict(X)decoded_imgs = decoder.predict(X)

文章插图

自动编码器模型结构
图像重构：

X_decoded_deep_ae = sc.inverse_transform(decoded_imgs)reconstructed_image_deep_ae = Image.fromarray(np.uint8(X_decoded_deep_ae))plt.figure(figsize=(8,12))plt.imshow(reconstructed_image_deep_ae,cmap = plt.cm.gray)

文章插图

计算均方误差：
error_dae = my_rmse(image_matrix,X_decoded_deep_ae)多层自动编码器的均方误差为 8.57，性能优于 PCA，提升了 28% 。
具有非线性激活的附加层的自动编码器能够更好地捕获图像中的非线性特征。它能够比PCA更好地捕捉复杂的模式和像素值的突然变化。但是它需要花费相对较高的训练时间和资源。
总结本文主要介绍了主成分分析以及自动编码器两种方法，具体分析两者的优缺点，并且通过一个生动的示例进行详解。
完整代码github： samread81/PCA-versus-AE
作者：Abhishek Mungoli
deephub翻译组：Oliver Lee