Deep Learning techniques for classification of data with missing values
Abstract: Two deep learning techniques for classification on corrupt data are investigated and compared by performance. A simple imputation before classification is compared to imputation using a Variational Autoencoder (VAE). Both single and multiple imputation using the VAE are considered and compared in classification performance for different types and levels of corruption, and for different sample sizes for the multiple imputation. Two main corruption methods are implemented, designed to test the classifiers for the cases of data missing at random or data missing not at random. The MNIST data set is used for evaluating performance of the different techniques. It is shown that a Multilayer Perceptron (MLP) trained on VAE imputations outperform a MLP using a simple imputation for all tested levels of corruption. A Convolutional Neural Network (CNN) classifier trained with the simple imputation outperforms both the MLP and the VAE classifier on MNIST. This is expected since it is designed to perform well on data sets with high local correlation, like image data sets, whereas the MLP and the VAE classifiers can generalize to other types of data. The reasons for the observed performance of the different techniques and possible implications are discussed.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)