Identification of de novo Transcription Factor Binding Motifs Created by Cancer-related Mutations

University essay from Uppsala universitet/Institutionen för biologisk grundutbildning

Abstract: In many countries, cancer is one of the biggest threats for citizens’ health, especially among aged people. Genomic mutations play a crucial role in cancer cell development. In previous decades, cancer research has been mainly focused on mutations in coding regions. These mutations can directly change the encoded protein sequences and influence their functions. In recent years, as the function of non-coding regions has been gradually understood, a growing number of studies have focused on the role of non-coding mutations in cancer. Transcription factor (TF) is an important group of gene regulatory factors. These factors only bind to specific sequences called transcription factor binding motifs (TFBMs) in the genome. Mutations in these motifs can disrupt the TF binding and thus influence gene regulation. A framework called funMotifs was made to predict and annotate functional TFBMs in the human genome. And a research has been made to intersect the mutation information from Pan-Cancer Analysis of Whole Genomes (PCAWG) to motifs in funMotifs, aiming to give a general view of influence of cancer-related mutations on functional TF motifs. But the research only focused on the existing motifs that were identified previously from the normal genome, while de novo motifs that could be potentially created by mutations were disregarded. An instance near the TERT promoter has been found, showing that mutations create a de novo ETS binding site and up-regulate the TERT expression.  My study aims to extend the borderline of funMotifs, from existing motifs to de novo motifs created by cancer-related mutations. I extended the original motifs in funMotifs database and merged the overlapping motifs into longer regulatory elements. Then I mutated these elements according to the mutation data from PCAWG. Next I scan through the mutated elements and identify TF motifs. These motifs were then intersected with original motifs in funMotifs database to remove the redundant results. After intersection and filtering, 2,525,771 de novo motifs were retained. These motifs mainly come from C2H2 zinc finger factors, tryptophan cluster factors, STAT domain factors, fork head/winged helix factors, MADS box factors and homeo domain factors. Even though the de novo motifs I found in this study still need further verification and analysis, for example the change of information content in the mutated sites of the motifs, the result I obtained can be a useful data source for further research on regulatory impact from cancer-related mutations.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)