A Comparative Study of the Quality between Formality Style Transfer of Sentences in Swedish and English, leveraging the BERT model

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Formality Style Transfer (FST) is the task of automatically transforming a piece of text from one level of formality to another. Previous research has investigated different methods of performing FST on text in English, but at the time of this project there were to the author’s knowledge no previous studies analysing the quality of FST on text in Swedish. The purpose of this thesis was to investigate how a model trained for FST in Swedish performs. This was done by comparing the quality of a model trained on text in Swedish for FST, to an equivalent model trained on text in English for FST. Both models were implemented as encoder-decoder architectures, warm-started using two pre-existing Bidirectional Encoder Representations from Transformers (BERT) models, pre-trained on Swedish and English text respectively. The two FST models were fine-tuned for both the informal to formal task as well as the formal to informal task, using the Grammarly’s Yahoo Answers Formality Corpus (GYAFC). The Swedish version of GYAFC was created through automatic machine translation of the original English version. The Swedish corpus was then evaluated on the three criteria meaning preservation, formality preservation and fluency preservation. The results of the study indicated that the Swedish model had the capacity to match the quality of the English model but was held back by the inferior quality of the Swedish corpus. The study also highlighted the need for task specific corpus in Swedish. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)