Efficient Characterization of Short Anelloviruses Fragments Found in Metagenomic Samples

University essay from Bioinformatik; Tekniska högskolan

Abstract: Some viral metagenomic serum samples contain a huge amount of Anellovirus, which is a genetically diverse family with a few conserved regions making it hard to efficiently characterize. Multiple sequence alignment of the Anelloviruses found in the sample must be constructed to get a clear picture of Anellovirus diversity and to identify stable regions. Using available multiple sequence alignment software directly on these fragments results in an MSA of a very poor quality due to their diversity, misaligned regions and low-quality regions present in the sequence. An efficient MSA must be constructed in order to characterize these Anellovirus present in the samples. Pairwise alignment is used to align one fragment to the database sequences at a time. The fragments are then aligned to the database sequences using the start and end position from the pairwise alignment results. The algorithm will also exclude non-aligned portions of the fragments, as these are very hard to handle properly and are often products of misassembly or chimeric sequenced fragments. Other tools to aid further analysis were developed, such as finding a non-overlapping window that contains the most fragments, find consensus of the alignment and extract any regions from the MSA for further analysis. An MSA was constructed with a high percent of correctly aligned bases compared to an MSA constructed using MSA softwares. The minimal number of genomes found in the sampled sequence was found as well as a distribution of the fragments along the database sequence. Moreover, highly conserved region and the window containing most fragments were extracted from the MSA and phylogenetic trees were constructed for these regions. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)