Exploration of relationships from texts using self-organizing maps

University essay from Institutionen för teknik och byggd miljö

Author: Weiping Lu; [2007]

Keywords: ;

Abstract: This thesis explored and visualized the relationships of documents data, based on the technique of self-organizing maps (SOM), a subtype of artificial neural network for visualizing high-dimensional data in low-dimensional views. The source data for this thesis are the full Extensible Markup Language (XML) texts of A Standard Corpus of Present Day Edited American English. The first step is transforming these XML files to produce a term-document matrix, including stop word removal, stemming, tf-idf (term frequency–inverse document frequency) weighting, global filtering; here rows of this matrix represent documents as n-dimensional vectors. Secondly, these vectors are clustered and visualized by SOM consisting of neurons, each neuron relatives to a set of documents with a certain number of same terms. Then a network has been constructed from SOM, with vertices set of neurons and documents, lines set of linkages between neurons and documents. Finally this network exports to the Pajek for analysis and final visualization.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)