Modelling of the DNA Helix’s Duration for Genome Sequencing

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Nanopore sequencing is the next generation ofsequencing methods which promises to deliver cheaper andmore portable genome sequencing capabilities. A single DNAor RNA strand is passed through a nanopore nested in anartificial membrane with an electric potential applied across it.The nucleotide bases of the helix then interact with the ioniccurrent in the nanopore, resulting in a unique signal that canbe translated into the correct corresponding nucleotide sequence.This project investigated whether features of the raw signal datacould be used as predictive indicators of the duration time ofeach nucleotide base in the nanopore. This is done in orderto segment the signal before translation. The training data setused came from the sequenced DNA molecules of an E. Colibacterium. Distribution candidates were fitted to a histogram ofthe duration data of the training set. Features of the currentsignal and distribution parameters were correlated in orderinvestigate if a linear predictive model could be created. Theresults indicate that the feature zero-crossings is not an optimaloption for construction of a linear model, while the large jumpsand moving variance features often generate linear patterns. The parameter of the Log-logistic distribution had the best fit withthe lowest relative root mean square deviation (rRMSD) of 2.7%.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)