Using Ensemble Machine Learning Methods in Estimating Software Development Effort

University essay from Blekinge Tekniska Högskola/Institutionen för datavetenskap

Abstract: Background: Software Development Effort Estimation is a process that focuses on estimating the required effort to develop a software project with a minimal budget. Estimating effort includes interpretation of required manpower, resources, time and schedule. Project managers are responsible for estimating the required effort. A model that can predict software development effort efficiently comes in hand and acts as a decision support system for the project managers to enhance the precision in estimating effort. Therefore, the context of this study is to increase the efficiency in estimating software development effort. Objective: The main objective of this thesis is to identify an effective ensemble method to build and implement it, in estimating software development effort. Apart from this, parameter tuning is also implemented to improve the performance of the model. Finally, we compare the results of the developed model with the existing models. Method: In this thesis, we have adopted two research methods. Initially, a Literature Review was conducted to gain knowledge on the existing studies, machine learning techniques, datasets, ensemble methods that were previously used in estimating Software Development Effort. Then a controlled Experiment was conducted in order to build an ensemble model and to evaluate the performance of the ensemble model for determining if the developed model has a better performance when compared to the existing models.   Results: After conducting literature review and collecting evidence, we have decided to build and implement stacked generalization ensemble method in this thesis, with the help of individual machine learning techniques like Support vector regressor (SVR), K-Nearest Neighbors regressor (KNN), Decision Tree Regressor (DTR), Linear Regressor (LR), Multi-Layer Perceptron Regressor (MLP) Random Forest Regressor (RFR), Gradient Boosting Regressor (GBR), AdaBoost Regressor (ABR), XGBoost Regressor (XGB). Likewise, we have decided to implement Randomized Parameter Optimization and SelectKbest function to implement feature section. Datasets like COCOMO81, MAXWELL, ALBERCHT, DESHARNAIS were used. Results of the experiment show that the developed ensemble model performs at its best, for three out of four datasets. Conclusion: After evaluating and analyzing the results obtained, we can conclude that the developed model works well with the datasets that have continuous, numeric type of values. We can also conclude that the developed ensemble model outperforms other existing models when implemented with COCOMO81, MAXWELL, ALBERCHT datasets.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)