The importance of data when training a CNN for medical diagnostics : A study of how dataset size and format affects the learning process of a CNN

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Johan Rudelius; Erik Zetterström; [2020]

Keywords: ;

Abstract: Using the computational capabilities of computers within the medical field has become increasingly popular since the emergence of CAD during the middle of the twentieth century. The prevalence of skin cancer attracted research resources, and in 2017, a group of scientists from Stanford University trained a CNN which could outperform board certified dermatologists in several skin cancer classification tests. The Stanford study gave rise to another study conducted by Boman and Volminger who tried to replicate the results using publicly available data. However, they did not achieve the same performance. It was the ambition of this study to extend the work of Boman and Volminger. But due to a large part of the training data being unavailable, comparisons were difficult to make and therefore the ambitions of the study shifted. The models presented in this study achieved 3-way classification accuracies of 82.2% and 87.3% for the balanced and imbalanced models respectively. The balanced model was trained on a data set which had been randomly oversampled and downsampled to make the different classes equal in size. The balanced model showed greater average values of specificity and sensitivity at a relatively small loss of accuracy. Despite the accuracies of these models being higher than that produced by Boman and Volminger, it is difficult to draw any conclusions as the methodology in this study diverged from the previous work.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)