The Impact of Imbalanced Training Data for Convolutional Neural Networks

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC); KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: David Masko; Paulina Hensman; [2015]

Keywords: ;

Abstract: This thesis empirically studies the impact of imbalanced training data on Convolutional Neural Network (CNN) performance in image classification. Images from the CIFAR-10 dataset, a set containing 60 000 images of 10 different classes, are used to create training sets with different distributions between the classes. For example, some sets contain a disproportionately large amount of images of one class, and others contain very few images of one class. These training sets are used to train a CNN, and the networks’ classification performance is measured for each training set. The results show that imbalanced training data can potentially have a severely negative impact on overall performance in CNN, and that balanced training data yields the best results. Following this, oversampling is used on the imbalanced training sets to increase the performances to that of the balanced set. It is concluded that oversampling is a viable way to counter the impact of imbalances in the training data.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)