Hierarchical clustering matrix (HCM) method applied to DNA barcode assembly for bacterial chromosomes
Abstract: DNA barcodes carry coarse-grained genetic information of DNA sequences taken from a genome. Potential applications include bacteriology, medical diagnosis and taxonomy. However, the current state-of-the-art tools for extracting DNA molecules from cells provide only fragmented pieces of chromosomal DNA. As a consequence, also DNA barcodes are fragmented. This calls for the development of complementary computational methods to piece up the fragments which help to restore the intact barcodes. Challenges for such developments are noise effects, an influence of DNA structural variation and experimental errors. This thesis presents a new method for assembling DNA fragments of large sizes (300 kilobase pairs in mean length). We develop a matrix-based hierarchical clustering algorithm to piece together the DNA fragments by assembling the overlapping DNA regions. Two barcodes are compared by sliding one to another to find the best alignment position. Following this step, we average the overlapping regions and stitch two barcodes together into an assembled barcode. By repeating the above process, we could get a near-intact full barcode of an intact chromosome. We demonstrate that our method works quite well for assembling fragments of theory barcodes with added noise. For the experimental barcodes, we only get several large pieces instead of an intact barcode. In the last section we discuss possible improvements of our method and future applications of DNA barcode assembly of large-sized DNA barcodes.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)