Evaluating machine learning strategies for classification of large-scale Kubernetes cluster logs

University essay from Blekinge Tekniska Högskola/Institutionen för datavetenskap

Abstract: Kubernetes is a free, open-source container orchestration system for deploying and managing Docker containers that host microservices. Its cluster logs are extremely helpful in determining the root cause of a failure. However, as systems become more complex, locating failures becomes more difficult and time-consuming. This study aims to identify the classification algorithms that accurately classify the given log data and, at the same time, require fewer computational resources. Because the data is quite large, we begin with expert-based feature selection to reduce the data size. Following that, TF-IDF feature extraction is performed, and finally, we compare five classification algorithms, SVM, KNN, random forest, gradient boosting and MLP using several metrics. The results show that Random forest produces good accuracy while requiring fewer computational resources compared to other algorithms. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)