Named Entity Recognition with Support Vector Machines

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Joel Mickelin; [2013]

Keywords: ;

Abstract: This report describes a degree project in Computer Science, the aim of which was to construct a system for Named Entity Recognition in Swedish texts of names of people, locations and organizations, as well as expressions for time. This system was constructed from the part-of-speech tagger Granska and the Support Vector Machine system SVMlin. The completed system was trained to recognize Named Entities by analyzing patterns in training corpora consisting of lists of example words belonging to each category. The system was initially trained to recognize patterns based on individual characters in words, but was later rewritten to recognize other characteristics of individual words such as the types of characters the words contained. When evaluating the system, it was determined that no incarnation of the system managed to perform satisfactorily when tested to recognize Named Entities of the aforementioned categories. A possible reason for this is that three of the categories, i.e. names of people, names of locations and names of organizations have few or no distinguishing features between them, which might warrant more research. The system proved apt when tested with solving the related problem of distinguishing email addresses from other named entities, indicating that the system might be of use in some cases of Named Entity Recognition.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)