A Rank Score Model of Variants Prioritization for Rare Disease

University essay from Uppsala universitet/Institutionen för biologisk grundutbildning

Author: Nanxing Liu; [2023]

Keywords: machine learning; variants; rare disease;

Abstract: The diagnosis of genetic illnesses has undergone a revolution with advancements in sequencing technology. Next-generation sequencing (NGS) has become a standard practice in genetic diagnostics, enabling the identification of various genetic variations. However, distinguishing causative variants from a vast number of benign background variants presents a significant challenge. This study focuses on improving the rank score model used in genetic rare-disease diagnostics at a clinical genomics facility in Stockholm. The objective is to develop a more effective and optimized model through the utilization of exploratory data analysis techniques and machine learning methods, investigating the strengths and weaknesses of various existing annotation scores to identify suitable features and enhance the model's classification performance. The research methodology involved analyzing publicly available ClinVar data, utilizing statistical methods such as principal component analysis (PCA), heatmap, Welch's t-test, and Chi-Square test to evaluate the correlation, patterns, and classification abilities of different variant types. In addition, the study employed a machine learning approach that combines allele frequency filtering and logistic regression trained on both public and in-house datasets to prioritize single nucleotide variants (SNVs) and insertions/deletions (InDels). The resulting model assigns binary class labels (benign or pathogenic) and provides scores for variant classification. Promising performance was observed in both the ClinVar dataset and the unique patient datasets, demonstrating the model's potential for clinical application. The findings of this study hold the potential to enhance genetic rare-disease diagnostics and contribute to advancements in rare disease research. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)