Learning the shapes of protein pockets

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: The comparison of protein pockets plays an important role in drug discovery. Through the identification of binding sites with similar structures, we can assist in finding hits and characterizing the function of proteins. Traditionally, the geometry of cavities has been described with scalar features, which are not fully representative of the shape. In this work, we propose a method that creates geometrical descriptors of the pocket shape based on Euclidean neural networks, allowing us to encode their physical features. As a result, we can compare the cavities by computing the Euclidean distance between their respective embeddings. As a way of ensuring that the generated embeddings contain relevant geometrical information, our model was trained on a supervised classification task to predict whether given pockets are druggable. To do this, a new dataset was built from the existing sc-PDB database that served as a reference to set the druggable cavities. Then, the protein cavity detection algorithm Fpocket was applied to generate decoys. The supervised model is evaluated by predicting druggability on held-out data, while the utility of the learned embeddings is assessed by comparing how a pocket changes during a dynamic simulation. The findings obtained are encouraging and point to a possible paradigm shift in the way pocket shape can be learned. All code is available at https://github.com/acorrochanon/Pocket-shapes.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)