Intelligent Scheduling for Yarn : Using Online Machine Learning

University essay from KTH/Skolan för informations- och kommunikationsteknik (ICT)

Author: Zahin Azher Rashid; [2017]

Keywords: ;

Abstract: Many big companies who provide cloud infrastructure, platforms and services have to face a lot of challenges in dealing with big data and execution of thousands of tasks that run on servers. Thousands of servers running in cloud consume a large amount of energy which increases operating cost to a great extent for companies hosting infrastructures and platforms as services. Hundreds of thousands of applications are submitted every day on these servers by users. On submission of applications, somehow the total resources are not properly utilized which cause the overall operating cost to increase. A distribution of Apache Hadoop called HOPS is developed at SICS Swedish ICT and efforts are made to make it a better platform for institutions and companies. Yarn is used as the resource management and scheduling framework which is responsible for allocating resources such as memory and CPU cores to submitted applications. Yarn simply allocate resources based on the default set of values or what user has demanded. Yarn has no prior information about the submitted applications so it is very much possible that allocated resources are more or less than required. Energy is being wasted if fewer resources are required or application will probably not succeed if required more. In this research project, different techniques and methods are looked into for the collection of useful metrics related to applications and resources from Yarn, Spark and other sources. Machine Learning is becoming a very popular technique nowadays for the optimization of systems dealing with big data in a cloud environment. The goal is to collect these vital metrics and build a machine learning model to commission smart allocation of resources to submitted applications. This can help to increase the efficiency of the servers in the cloud and reduce the operating cost. Finally, a machine learning model was developed and memory and vCores were successfully predicted to be allocated to applications.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)