Modelling Large Protein Complexes

University essay from Uppsala universitet/Institutionen för biologisk grundutbildning

Abstract: AlphaFold [Jumper et al., 2021, Evans et al., 2022] is a deep learning-based method that can accurately predict the structure of single- and multiple-chain proteins. However, its accuracy decreases with an increasing number of chains, and GPU memory limits the size of protein complexes that can be predicted. Recently, Elofsson’s groupintroduced a Monte Carlo tree search method, MoLPC, that can predict the structure of large complexes from predictions of sub-components [Bryant et al., 2022b]. However, MoLPC cannot adjust for errors in the sub-component predictions and requires knowledge of the correct protein stoichiometry. Large protein complexes are responsible for many essential cellular processes, such as mRNA splicing [Will and Lührmann, 2011], protein degradation [Tanaka, 2009], and protein folding [Ditzel et al., 1998]. However, the lack of structural knowledge of many large protein complexes remains challenging. Only a fraction of the eukaryoticcore complexes in CORUM [Giurgiu et al., 2019] have homologous structures covering all chains in PDB, indicating a significant gap in our structural understanding of protein complexes. AlphaFold-Multimer [Evans et al., 2022] is the only deep learning method that can predict the structure of more than two protein chains, trained on proteins of up to 20 chains, and can predict complexes of up to a few thousand residues, where memory limitations come into play. Another approach, MoLPC, is to predict the structure of sub-components of large complexes and assemble them. It has shown that it is possible to manually assemble large complexes from dimers manually [Burke et al., 2021] or use Monte Carlo tree search [Bryant et al., 2022b]. One limitation of the previous MoLPC approach is its inability to account for errors in sub-component prediction. The addition of small errors in each sub-component can propagate to a significant error when building the entire complex, leading toMoLPC’s failure. To overcome this challenge, the Monte Carlo Tree Search algorithms in MoLPC2 is enhanced to assemble protein complexes while simultaneously predicting their stoichiometry. Using MoLPC2, we accurately predicted the structures of 50 out of 175 non-redundant protein complexes (TM-score >0.8), while MoLPC only predicted 30. It should be noted that improvements introduced in AlphaFold version 2.3 enable the prediction of larger complexes, and if stoichiometry is known, it can accurately predict the structures of 74 complexes. Our findings suggest that assembling symmetrical complexes from sub-components results in higher accuracy while assembling asymmetrical complexes remains challenging.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)