Managing imbalanced training data by sequential segmentation in machine learning

University essay from Linköpings universitet/Avdelningen för medicinsk teknik

Author: Susana Bardolet Pettersson; [2019]

Keywords: ;

Abstract: Imbalanced training data is a common problem in machine learning applications. Thisproblem refers to datasets in which the foreground pixels are significantly fewer thanthe background pixels. By training a machine learning model with imbalanced data, theresult is typically a model that classifies all pixels as the background class. A result thatindicates no presence of a specific condition when it is actually present is particularlyundesired in medical imaging applications. This project proposes a sequential system oftwo fully convolutional neural networks to tackle the problem. Semantic segmentation oflung nodules in thoracic computed tomography images has been performed to evaluate theperformance of the system. The imbalanced data problem is present in the training datasetused in this project, where the average percentage of pixels belonging to the foregroundclass is 0.0038 %. The sequential system achieved a sensitivity of 83.1 % representing anincrease of 34 % compared to the single system. The system only missed 16.83% of thenodules but had a Dice score of 21.6 % due to the detection of multiple false positives. Thismethod shows considerable potential to be a solution to the imbalanced data problem withcontinued development.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)