Using machine learning for resource provisioning to run workflow applications in IaaS Cloud

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: William Våge; [2019]

Keywords: ;

Abstract: The rapid advancements of cloud computing has made it possible to execute large computations such as scientific workflow applications faster than ever before. Executing workflow applications in cloud computing consists of choosing instances (resource provisioning) and then scheduling (resource scheduling) the tasks to execute on the chosen instances. Due to the fact that finding the fastest execution time (makespan) of a scientific workflow within a specified budget is a NP-hard problem, it is common to use heuristics or metaheuristics to solve the problem. This thesis investigates the possibility of using machine learning as an alternative way of finding resource provisioning solutions for the problem of scientific workflow execution in the cloud. To investigate this, it is evaluated if a trained machine learning model can predict provisioning instances with solution quality close to that of a state-of-the-art algorithm (PACSA) but in a significantly shorter time. The machine learning models are trained for the scientific workflows Cybershake and Montage using workflow properties as features and solution instances given by the PACSA algorithm as labels. The predicted provisioning instances are scheduled utilizing an independent HEFT scheduler to get a makespan. It is concluded from the project that it is possible to train a machine learning model to achieve solution quality close to what the PACSA algorithm reports in a significantly shorter computation time and that the best performing models in the thesis were the Decision Tree Regressor (DTR) and the Support Vector Regressor (SVR). This is shown by the fact that the DTR and the SVR on average are able to be only 4.97 % (Cybershake) and 2.43 % (Montage) slower than the PACSA algorithm in terms of makespan while imposing only on average 0.64 % (Cybershake) and 0.82 % (Montage) budget violations. For large workflows (1000 tasks), the models showed an average execution time of 0.0165 seconds for Cybershake and 0.0205 seconds for Montage compared to the PACSA algorithm’s execution times of 57.138 seconds for Cybershake and 44.215 seconds for Montage. It was also found that the models are able to come up with a better makespan than the PACSA algorithm for some problem instances and solve some problem instances that the PACSA algorithm failed to solve. Surprisingly, the ML models are able to even outperform PACSA in 11.5 % of the cases for the Cybershake workflow and 19.5 % of the cases for the Montage workflow.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)