Word Segmentation for Classification of Text

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Anusha Anusha; [2019]

Keywords: ;

Abstract: Compounding is a highly productive word-formation process in some languages that is often problematic for natural language processing applications. Word segmentation is the problem of splitting a string of written language into its component words. The purpose of this research is to do a comparative study on different techniques of word segmentation and to identify the best technique that would aid in the extraction of keyword from the text. English was chosen as the language. Dictionary-based and Machine learning approaches were used to split the compound words. This research also aims at evaluating the quality of a word segmentation by comparing it with the segmentation of reference. Results indicated that Dictionary-based word segmentation showed better results in segmenting a compound word compared to the Machine learning segmentation when technical words were involved. Also, to improve the results for the text classification, improving the quality of the text alone is not the key

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)