Creating a coreference solver for Swedish and German using distant supervision

University essay from Lunds universitet/Institutionen för datavetenskap

Abstract: It is said that coreference is difficult to explain, but easy to comprehend; everyoneknows coreference, they just don’t know that they do. We trained a computer toknow it too! Coreference resolution is the identification of phrases that refer to the same entity in a text. Current techniques to solve coreferences use machine-learning algorithms, which require large annotated data sets. Such annotated resources are not available for most languages today. In this report, we describe a method for solving coreference for Swedish and German without annotated texts using distant supervision. We generate a weakly labelled training set using multi- lingual corpora, where we solve the coreference for English using CoreNLP and transfer it to Swedish and German using word alignment. Additionally, we identify mentions from dependency graphs in both languages using hand- written rules. Finally, we evaluate the end-to-end results using the evaluation framework from the CoNLL 2012 shared task where we obtain an F-measure of 34.98 for Swedish and 13.16 for German.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)