Machine learning for molecular property prediction and drug safety

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: Utilizing machine learning methods for the prediction of acid dissociation (pKa ) values of compounds holds great significance, as pKa is an important parameter, optimized frequently in drug discovery. Accurate prediction of pKa values could potentially provide valuable insights on other molecular properties and thereby support compound design. In an attempt to extend the scope of pKa prediction, we have created several machine learning models utilizing internal AstraZeneca data. We explored both classical ML approaches with different molecular descriptors, and deep learning methods. The results showed that graph neural network based models outperform tree based methods and yielded reasonable predictions for both acidic and basic pKa values. Through the implementation of several data splitting strategies, we have substantiated that the models hold the potential to generalize well to novel compounds and outperform state of the art methods. Besides evaluating the models on different splits of the internal data, their performance was also assessed on public datasets. This yielded comparatively lower accuracies which can be attributed to the collation of data from diverse sources and the high experimental variability of the publicly available data.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)