A Method for Automatic Question Answering in Swedish based on BERT

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Tove Tengvall; [2020]

Keywords: ;

Abstract: This report presents a method for doing automatic reading comprehension in Swedish. The method is based on BERT, a pre-trained Swedish neuralnetwork language model, which was fine-tuned on a Swedish question-answer corpus. This corpus was built by having human annotators posing questions expressed in natural language over paragraphs of text present in a set of articles collected from Swedish Wikipedia and the Swedish Migration Agency. In the task defined, the model was supposed to return the short span of text within the given paragraph that constitutes the correct answer to a given question. The dataset was partitioned into 910 question-answer pairs for training and 105 pairs for validation. The quality of the method was evaluated on 257 questions. The returned answers were compared to the correct answers from the corpus, as well as to the results from a simpler grammatical method that was developed as a comparison (a baseline). Using the fine-tuned Swedish BERT-base model, we can obtain an FScore of 78.1% and an Exact Match Score of 63.0% when evaluating the model on the collection of questions generated in the study. The model outperforms the baseline and is assessed to be a successful method for the question answering task defined. However, although these results indicate that BERT has great potential as a method for automatic question answering in Swedish, the results are not as good as the results of the English BERT model fine-tuned on the English question-answer corpus called SQuAD. The reason for the poorer performance of the Swedish model may be explained by the size of the question-answer corpus used in the study being much smaller than the SQuAD corpus.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)