Prediction of compound solubility in Dimethyl sulfoxide using machinelearning methods including graph neural networks

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Viktor Norrsjö; [2020]

Keywords: ;

Abstract: In drug discovery, compounds that are insoluble in Dimethyl sulfoxide (DMSO) are not wanted and can be disregarded. To avoid wasting time and resources pharmaceutical companies are trying to predict compound solubility before selecting compounds for further research. Compound solubility is hard to predict with confidence and this project focus on prediction using machine learning methods. The used dataset consists of almost 12 thousand compounds label soluble or insoluble and is very label biased towards soluble compounds. Different ways of representing compounds are tested with the four machine learning methods: Support Vector Machine, Random Forest, Multilayer Perceptronand a state-of-the-art graph convolution neural network called Directed MessagePassing Neural Network. After performing a 5-fold cross-validation, it can be concluded that a Directed Message Passing Neural Network performs better than the other machine learning methods when they are trained with classical compound representations and on par when they are trained with the latent space descriptors, Section 2.1.2. Finally, with an external experiment, it is shown that the best Directed Message Passing Neural Network is able to significantly increase the occurrence of found insoluble compounds compared to a random selection.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)