Reaction Conditions Data Mining : The application of Machine Learning towards predicting the future of process development

University essay from

Author: Samuel Hallinder; [2019]

Keywords: ;

Abstract: In organic chemistry and especially process chemistry, there is a constant need to develop cost-effective ways to optimize different reaction conditions. With the increased development of Machine Learning (ML) combined with Data Mining (DM) new possibilities arise to reduce time and costs in the field of chemical science. In order to address the need for reduced time-/cost savings in process chemistry, the often-employed Suzuki-Miyaura reaction was studied by such ML and DM methods. A representative dataset containing molecular and structural properties of substrates and product were calculated with open-source toolkits Indigo, Chemistry Development Kit and RDKit available in KNIME. To predict any form of reaction outcomes, catalysts and reaction conditions were ranked based on several binary classification Machine Learning models designed with a Random Forest algorithm. On model lead to a binary classification model performing at a low computational cost. It showed an AUC of 98.5% predicting a reaction to a certain threshold of yield ( >=60% and<=40%). A second model encompassed six unique binary classification models and presented an average accuracy of 91.6% to predict a correct catalyst. These six different models were combined to later rank catalysts that are best suited for a new reaction and gave a probability result between 23.6% to 77.3%. The experimental validation was proven to highlight the uncertainty of the performance, were the least suitable (23.6%) catalyst demonstrated best performance. Overall, the models showed a promising correlation to support the synthesis optimization problem and with further adjustment there are great opportunities to obtain a model that can assist chemists in the future.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)