STAIRS : Data reduction strategy on genomics

University essay from Uppsala universitet/Institutionen för biologisk grundutbildning

Abstract: Background. An enormous accumulation of genomic data has been taking place over the last ten years. This makes the activities of visualization and manual inspection, key steps in trying to understand large datasets containing DNA sequences with millions of letters. This situation has created a gap between data complexity and qualified personnel due to the need of trading between visualization, reduction capacity and exploratory functions, features rarely achieved by existing tools, such as SRA toolkit (, for instance. A novel approach to the problem of genomic analysis and visualization was pursued in this project, by means of STrAtified Interspersed Reduction Structures (STAIRS). Result. Ten weeks of intense work resulted in novel algorithms to compress data, transform it into stairs vectors and align them. Smith–Waterman and Needleman–Wunsch algorithms have been specially modified for this purpose and the application brought about statistical performance and behavioural charts.

