Influence of Preprocessing Steps for Molecular Data on Deep Neural Network Performance

University essay from Högskolan i Skövde/Institutionen för biovetenskap

Author: Tajouj Malla; [2023]

Keywords: ;

Abstract: The massive accumulation of omics data requires effective computational tools to analyze and interpret such data. Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI), has shed light on these challengings and achieved great success in bioinformatics. However, the influence of preprocessing steps on DL model’s performance remains a critical aspect that requires thorough investigation. This study aims to investigates the effects of different combinations of preprocessing techniques and feature selection methods on the predictive performance of deep neural networks (DNN) on supervised tasks. For this purpose, four normalization methods, one transformation method, and two feature scaling methods were applied, in addition to two feature selection methods. This comprehensive analysis resulted in a total of 28 unique combinations, each representing a unique classifier. The experimental analysis was conducted using gene expression profiles from multiple cancer datasets. The result highlights the significance of preprocessing step in achieving optimal DNN performance, with notable variations observed across different datasets and preprocessing techniques. We identify a specific preprocessing workflow that improve DNN performance, and certain preprocessing choices that may lead to suboptimal model performance. In addition, we identify potential pitfalls and challenges associated with the data structure and class imbalance. This study contributes to the understanding of the effect of pre-processing steps and provides insights into which pre-processing steps work best and hence, improve the overall performance of DNN model and enables the development of more robust and accurate models.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)