Extractive Text Summarization of Norwegian News Articles Using BERT

University essay from Linköpings universitet/Medie- och Informationsteknik; Linköpings universitet/Tekniska fakulteten

Abstract: Extractive text summarization has over the years been an important research area in Natural Language Processing. Numerous methods have been proposed for extracting information from text documents. Recent works have shown great success for English summarization tasks by fine-tuning the language model BERT using large summarization datasets. However, less research has been made for low-resource languages. This work contributes by investigating how BERT can be used for Norwegian text summarization. Two models are developed by applying a modified BERT architecture, called BERTSum, on pre-trained Norwegian and Multilingual BERT. The results are models able to predict key sentences from articles to generate bullet-point summaries. These models are evaluated with the automatic metric ROUGE and in this evaluation, the Multilingual BERT model outperforms the Norwegian model. The multilingual model is further evaluated in a human evaluation by journalists, revealing that the generated summaries are not entirely satisfactory in some aspects. With some improvements, the model shows to be a valuable tool for journalists to edit and rewrite generated summaries, saving time and workload.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)