The Sparse Data Problem Within Classification Algorithms : The Effect of Sparse Data on the Naïve Bayes Algorithm

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Johan Bissmark; Oscar Wärnling; [2017]

Keywords: ;

Abstract: In today’s society, software and apps based on machine learning and predictive analysis are of the essence. Machine learning has provided us with the possibility of predicting likely future outcomes based on previously collected data in order to save time and resources.   A common problem in machine learning is sparse data, which alters the performance of machine learning algorithms and their ability to calculate accurate predictions. Data is considered sparse when certain expected values in a dataset are missing, which is a common phenomenon in general large scaled data analysis.   This report will mainly focus on the Naïve Bayes classification algorithm and how it is affected by sparse data in comparison to other widely used classification algorithms. The significance of the performance loss associated with sparse data is studied and analyzed, in order to measure the effect sparsity has on the ability to compute accurate predictions.   In conclusion, the results of this report lay a solid argument for the conclusion that the Naïve Bayes algorithm is far less affected by sparse data compared to other common classification algorithms. A conclusion that is in line with what previous research suggests.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)