Evaluation of Feature Selection Methods for Machine Learning Classification of Breast Cancer

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Niklas Lindqvist; Thony Price; [2018]

Keywords: ;

Abstract: Breast cancer is the leading cause of cancer deaths among women today. Computer aided diagnosis has proved efficient in assisting medical experts to set an early diagnosis improving the chance of recovery. Computer aided diagnostics utilizes machine learning to make a prediction whether a patient has a benign or malignant cancer. For this purpose, machine learning algorithms are used to perform classification. Applying feature selection the algorithms can be fed data with lower dimensionality and can produce a more accurate result. In this report we conducted experiments with four different feature selection methods and four classifiers on four datasets. We found that Artificial neural networks have a significant increase in classification accuracy of breast cancer when applying feature selection. The maximum improvement in accuracy was 51% using the feature selection method Entropy and data from Royal Hallamshire Hospital. The accuracy achieved by artificial neural networks does not show any correlation with a specific feature selection method. Using Naïve Bayes, Support Vector Machines and Decision trees no increase in accuracy using feature selection could be statistically proven considering all datasets. However, in some observations these classifiers manifested increased classification accuracy with feature selection compared to using all features of the dataset.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)