Leveraging CNN for Automated Peak Picking in Untargeted Metabolomics without Parameter Dependencies

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: Metabolomics is a scientific discipline that involves the thorough analysis of small molecules, known as metabolites, found within a biological system. Furthermore, liquid chromatography-mass spectrometry (LC-MS) is a commonly used analytical technique in metabolomics for analysing biological samples due to its broad coverage of the measurable metabolome. The technique is widely used and generates a large amount of raw data, covering a broad spectrum of metabolites. Consequently, there is a need to transform this raw data into a structured, tabular format that can be readily utilised for further analysis. Despite the existence of software tools offering automated peak detection, the necessity for visual inspection and manual corrections frequently arises, a process that is both time-consuming and demands specialised knowledge in the respective domain. In order to enhance data processing efficiency and automation, this project establishes, optimises, and evaluates a deep learning approach to perform regions of interest (ROI) detection utilising faster Region-based Convolutional Neural Network (R-CNN). The integration of deep learning within LC-MS analysis has the potential to enhance the overall efficiency and accuracy of metabolomic studies. Moreover, it can assist in constructing reliable predictive models for diverse LC-MS applications. The ROI detection was performed on reversed-phased positive liquid chromatography provided by the Chalmers Mass Spectrometry Infrastructure (CMSI). The model underwent training using a dataset comprising 524 chromatograms, followed by evaluation using a separate set of 151 chromatograms. The model takes segments of a chromatogram as inputs and generates predicted coordinates as outputs, indicating the locations of the ROIs. The evaluation of the results was conducted through both quantitative and qualitative analyses, using precision and recall, F1- score, Intersection over Union as well as manual inspection. The results were an average precision of 0.591, an average recall of 0.648, an F1-score of 0.617 and a mean IoU of 0.558. The findings demonstrate promising outcomes with substantial potential.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)