Exploration of an all-atom thermodynamic model to predict site-specific evolutionary rates in proteins
Abstract: Understanding the patterns of evolutionary sequence divergence is fundamental for comparative analyses like phylogenetics or genomics. The rate at which the different sites of protein sequences evolve is multifactorial and the causes of variation among them are highly convoluted. Inference methods have been developed to estimate site-specific evolution rates from sequence alignments. Moreover, several molecular traits have been found to correlate with site-specific rates: solvent accessibility, packing density and protein function are some of them. Correlations between rates and predictor variables allow to identify factors that influence rate variation, but they do not provide explicit mechanistic insights into why a given site is variable or conserved. Luckily, the field of protein evolution is amenable to the development of fundamental theory. Hence, mechanistic biophysical models have been proposed to explain the observed rates. Biophysical models are essentially based on protein stability - this is reasonable because stability is related to all the molecular features that correlate with evolutionary rates. Norn et al. developed an all-atom thermodynamic model to predict site-specific evolutionary rates in proteins. The model has been shown to closely recapitulate the average amino acid substitution rate behaviour. However, the model fails to achieve the same level of accuracy for site-specific rate recapitulation. Several reasons have been put forward so as to explain the weak correlation; but two hold the most interest: propagation of stability prediction errors and the fact that the model relies on only a single protein structure to extrapolate site-rates that emerge within a protein phylogeny. The results obtained in this thesis support the hypothesis defended by Norn et al. that the propagation of stability prediction errors impacts the correlation value but are not enough to explain the average weak correlation; additionally, it is shown that a weighted average of the site-rates of the proteins in a given phylogeny does a better job at recapitulating its empirically inferred site-rates. In consequence, this effectively opens the doors for further adaptation of Norn et al. model to phylogenetic analysis.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)