Experimental Research on a Continuous Integrating pipeline with a Machine Learning approach : Master Thesis done in collaboration with Electronic Arts
Abstract: Time-consuming code builds within the Continuous Integration pipeline is a common problem in today’s software industry. With fast-evolving trends and technologies, Machine Learning has become a more popular approach to tackle and solve real problems within the software industry. It has been shown to be successful to train Machine Learning models that can classify whether a code change is likely to be successful or fail during a code build. Reducing the time it takes to run code builds within the Continuous Integration pipeline can lead to higher productivity in software development, faster feedback for developers, and lower the cost of hardware resources used to run the builds. To answer the research question: How accurate can success or failure in code build be predicted by using Machine Learning techniques on the historical data collection? The important factor is the historical data available and understanding the data. Thorough data analysis was conducted on the historical data and a data cleaning process to create a dataset suitable for feeding the Machine Learning models. The dataset was imbalanced, favouring the successful builds, and to balance the dataset the SMOTE method was used to create synthetic samples. Binary classification and supervised learning comparison of four Machine Learning models were performed; Random Forest, Logistic Regression, Support Vector Machine, and Neural Network. The performance metrics used to measure the performance of the models were recall, precision, specificity, f1-score, ROC curve, and AUC score. To reduce the dimensionality of the features the PCA method was used. The outcome of the Machine Learning models revealed that historical data can be used to accurately predict if a code change will result in a code build success or failure.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)