A Comparison of Performance and Noise Resistance of Different Machine Learning Classifiers on Gaussian Clusters

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Oliver Schwalbe Lehtihet; Viktor Åryd; [2021]

Keywords: ;

Abstract: Most real world data contains some amount of noise, i.e. unwanted factors obscuring the underlying signal, making it harder to detect or categorize. This is a problem in machine learning classification. Multiple studies have shown the impact noise can have on the difficulty of different classification problems. Both attribute and class noise can have a big impact on classification accuracy, especially as the levels of noise increase. In this study we analyse how severely a number of different classification algorithms are affected by both attribute and class noise, and how the number of classes and parameters further affect their resistance to noise. Similar studies have been done before, but we aim to supplement this research by using classifiers not as broadly studied, with self- generated data. This aims to increase our control of experiment parameters to give us more easily interpretable results. Among the classification algorithms used (Support Vector Machine, Random Forest, K-Nearest Neighbours and K-Means Clustering), the Random Forest algorithm outperform the other classifiers in most of the tests performed. However, both initial performance with noise free data and the resistance to noise seem to be highly dependant on the nature of the data itself, and also on the type of noise introduced. Ultimately, more research is needed, especially concerning how different data distributions and classifier parameters impact noise resistance 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)