Using Naive Bayes and N-Gram for Document Classification Användning av Naive Bayes och N-Gram för dokumentklassificering

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Khalif Farah Mohamed; [2015]

Keywords: Bayes;

Abstract: The purpose of this degree project is to present, evaluate and improve probabilistic machine-learning methods for supervised text classification. We will explore Naive Bayes algorithm and character level n-gram, two probabilistic methods. The two methods will then be compared. Probabilistic algorithms like Naive Bayes and character level n-gram are some of the most effective methods in text classification, but to get accurate results they need a large training set. Because of too simple assumptions, Naive Bayes is a poor classifier. To rectify the problem, we will try to improve the algorithm, by using some transformed word and n-gram counts.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)