Evaluation of Machine Learning Classification Methods : Support Vector Machines, Nearest Neighbour and Decision Tree

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Sebastian Stiernborg; Sara Ervik; [2017]

Keywords: ;

Abstract: With more and more data available, the interest and use for machine learning is growing and so does the need for classification. Classifica- tion is an important method within machine learning for data simpli- fication and prediction. This report evaluates three classification methods for supervised learn- ing: Support Vector Machines (SVM) with several kernels, Nearest Neighbor (k-NN) and Decision Tree (DT). The methods were evalu- ated based on the factors accuracy, precision, recall and time. The experiments were conducted on artificial data created to represent a variation of distributions with a limitation of only 2 features and 3 classes. Different distributions of data were chosen to challenge each classification method. The results show that the measurements for ac- curacy and time vary considerably for the different distributed dataset. SVM with RBF kernel performed better for accuracy in comparison to the other classification methods. k-NN scored slightly lower accuracy values than SVM with RBF kernel in general, but performed better on the challenging dataset. DT is the less time consuming algorithm and was significally faster than the other classification methods. The only method that could compete with DT on time was k-NN that was faster than DT for the dataset with small spread and coinciding classes. Although a clear trend can be seen in the results the area needs to be studied further to draw a comprehensive conclusion due to limitation of the artificially generated datasets in this study. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)