Improving ligand-based modelling by combining various features

University essay from Uppsala universitet/Institutionen för farmaceutisk biovetenskap

Abstract: Background: In drug discovery morphological profiles can be used to identify and establish a drug's biological activity or mechanism of action. Quantitative structure-activity relationship (QSAR) is an approach that uses the chemical structures to predict properties e.g., biological activity. Support Vector Machine (SVM) is a machine learning algorithm that can be used for classification. Confidence measures as conformal predictions can be implemented on top of machine learning algorithms. There are several methods that can be applied to improve a model’s predictive performance. Aim: The aim in this project is to evaluate if ligand-based modelling can be improved by combining features from chemical structures, target predictions and morphological profiles. Method: The project was divided into three experiments. In experiment 1 five bioassay datasets were used. In experiment 2 and 3 a cell painting dataset was used that contained morphological profiles from three different classes of kinase inhibitors, and the classes were used as endpoints. Support vector machine, liblinear models were built in all three experiments. A significant level of 0.2 was set to calculate the efficiency. The mean observed fuzziness and efficiency were used as measurements to evaluate the model performance. Results: Similar trends were observed for all datasets in experiment 1. Signatures+CDK13+TP which is the most complex model obtained the lowest mean observed fuzziness in four out of five times. With a confidence level of 0.8, TP+Signatures obtained the highest efficiency. Signatures+Morphological Profiles+TP obtained the lowest mean observed fuzziness in experiment 2 and 3. Signatures obtained the highest correct single label predictions with a confidence of 80%. Discussion: Less correct single label predictions were observed for the active class in comparison to the inactive class. This could have been due to them being harder to predict. The morphological profiles did not contribute with an improvement to the models predictive performance compared to Signatures. This could be due to the lack of information obtained from the dataset. Conclusion: A combination of features from chemical structures and target predictions improved ligand-based modelling compared to models only built on one of the features. The combination of features from chemical structures and morphological profiles did not improve the ligand-based models, compared to the model only built on chemical structures. By adding features from target predictions to a model built with features from chemical structures and morphological profiles a decrease in mean observed fuzziness was obtained. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)