Prerequisites for Extracting Entity Relations from Swedish Texts

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Erik Lenas; [2020]

Keywords: Machine Learning; Natural Language Processing; Relation Extraction; Named Entity Recognition; Coreference resolution; BERT; Maskininlärning; Natural Language Processing; Relationsextrahering; Named Entity Recognition; Coreference resolution; BERT;

Abstract: Natural language processing (NLP) is a vibrant area of research with many practical applications today like sentiment analyses, text labeling, questioning an- swering, machine translation and automatic text summarizing. At the moment, research is mainly focused on the English language, although many other lan- guages are trying to catch up. This work focuses on an area within NLP called information extraction, and more specifically on relation extraction, that is, to ex- tract relations between entities in a text. What this work aims at is to use machine learning techniques to build a Swedish language processing pipeline with part-of- speech tagging, dependency parsing, named entity recognition and coreference resolution to use as a base for later relation extraction from archival texts. The obvious difficulty lies in the scarcity of Swedish annotated datasets. For exam- ple, no large enough Swedish dataset for coreference resolution exists today. An important part of this work, therefore, is to create a Swedish coreference solver using distantly supervised machine learning, which means creating a Swedish dataset by applying an English coreference solver on an unannotated bilingual corpus, and then using a word-aligner to translate this machine-annotated En- glish dataset to a Swedish dataset, and then training a Swedish model on this dataset. Using Allen NLP:s end-to-end coreference resolution model, both for creating the Swedish dataset and training the Swedish model, this work achieves an F1-score of 0.5. For named entity recognition this work uses the Swedish BERT models released by the Royal Library of Sweden in February 2020 and achieves an overall F1-score of 0.95. To put all of these NLP-models within a single Lan- guage Processing Pipeline, Spacy is used as a unifying framework.

AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)

Prerequisites for Extracting Entity Relations from Swedish Texts

Searchphrases right now

Popular searches

popular essays yesterday (2024-04-18)