Neural networks for imputation of missing genotype data : An alternative to the classical statistical methods in bioinformatics
Abstract: In this project, two different machine learning models were tested in an attempt at imputing missing genotype data from patients on two different panels. As the integrity of the patients had to be protected, initial training was done on data simulated from the 1000 Genomes Project. The first model consisted of two convolutional variational autoencoders and the latent representations of the networks were shuffled to force the networks to find the same patterns in the two datasets. This model was unfortunately unsuccessful at imputing the missing data. The second model was based on a UNet structure and was more successful at the task of imputation. This model had one encoder for each dataset, making each encoder specialized at finding patterns in its own data. Further improvements are required in order for the model to be fully capable at imputing the missing data.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)