Unsupervised text clusteringusing survey answers

University essay from KTH/Skolan för teknikvetenskap (SCI)

Author: Mathias Helgesson Törnqvist; Therese Stålhandske; [2017]

Keywords: ;

Abstract: Text data mining is a growing research field where machine learning and NLP areimportant technologies. There are multiple applications concerning categorizinglarge sets of documents. Depending on the size of the documents the methodsdi↵er, when it comes to short text documents the information in individualones are scant. The aim of this paper is to show how well unsupervised textclustering reflects existing class assignments and how sensitive clustering is whencomparing di↵erent text representation and feature selection. The raw datawas collected from several national health surveys. Evaluation was made with aconditional entropy-based method called V-measure which connects the clustersto the categories. We present that some methods perform significantly betteragainst raw data then others.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)