Evaluation of Machine Learning classifiers for Breast Cancer Classification

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Robin Dang; Anders Nilsson; [2020]

Keywords: ;

Abstract: Breast cancer is a common and fatal disease among women globally, where early detection is vital to improve the prognosis of patients. In today’s digital society, computers and complex algorithms can evaluate and diagnose diseases more efficiently and with greater certainty than experienced doctors. Several studies have been conducted to automate medical imaging techniques, by utilizing machine learning techniques, to predict and detect breast cancer. In this report, the suitability of using machine learning to classify whether breast cancer is of benign or malignant characteristic is evaluated. More specifically, five different machine learning methods are examined and compared. Furthermore, we investigate how the efficiency of the methods, with regards to classification accuracy and execution time, is affected by the preprocessing method Principal component analysis and the ensemble method Bootstrap aggregating. In theory, both methods should favor certain machine learning methods and consequently increase the classification accuracy. The study is based on a well-known breast cancer dataset from Wisconsin which is used to train the algorithms. The result was evaluated by applying statistical methods concerning the classification accuracy, sensitivity and execution time. Consequently, the results are then compared between the different classifiers. The study showed that the use of neither Principal component analysis nor Bootstrap aggregating resulted in any significant improvements in classification accuracy. However, the results showed that the support vector machines classifiers were the better performer. As the survey was limited in terms of the amount of datasets and the choice of different evaluation methods with associating adjustments, it is uncertain whether the obtained result can be generalized over other datasets or populations.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)