Algorithms for Aligning Genetic Sequences to Reference Genomes

University essay from Umeå universitet/Institutionen för datavetenskap

Author: Jens-ola Ekström; [2017]

Keywords: ;

Abstract: The technologies for sequencing genetic materials have improved vastly during the last fifteen years. Sequences can now be determined to affordable costs and therefore are more genetic sequences available than ever before. The bottleneck is no longer to obtain genetic sequences but rather to analyze all the sequence data. A primary step in sequence analysis is to determine a sequence fragment’s position in the genome, a process called aligning. From Computing Science point of view, this is essentially text matching. This is however a much more complex task than searching for strings in ordinary text documents. First, there is large amount of data. An ordinary sequencing experiment could generate more than 100 million sequences. Each sequence should be matched to a reference genome sequence, which is some billions characters in size. Second, the obtained sequences may have differences compared to the reference genome sequence. The algorithms are thus not only searching for exact matches, but for the best approximate matches. In this work I review the algorithms behind modern sequence alignment softwares. I also propose to evaluate to the fast Fourier transform for the task.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)