Automatic Segmentation of Swedish Medical Words with Greek and Latin Morphemes : A Computational Morphological Analysis

University essay from Stockholms universitet/Avdelningen för datorlingvistik

Abstract: Raw text data online has increased the need for designing artificial systems capable of processing raw data efficiently and at a low cost in the field of natural language processing (NLP). A well-developed morphological analysis is an important cornerstone of NLP, in particular when word look-up is an important stage of processing. Morphological analysis has many advantages, including reducing the number of word forms to be stored computationally, as well as being cost-efficient and time-efficient. NLP is relevant in the field of medicine, especially in automatic text analysis, which is a relatively young field in Swedish medical texts. Much of the stored information is highly unstructured and disorganized. Using raw corpora, this paper aims to contribute to automatic morphological segmentation by experimenting with state-of-art-tools for unsupervised and semi-supervised word segmentation of Swedish words in medical texts. The results show that a reasonable segmentation is more dependent on a high number of word types, rather than a special type of corpora. The results also show that semi-supervised word segmentation in the form of annotated training data greatly increases the performance.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)