Controllable sentence simplification in Swedish : Automatic simplification of sentences using control prefixes and mined Swedish paraphrases

University essay from Linköpings universitet/Institutionen för datavetenskap

Abstract: The ability to read and comprehend text is essential in everyday life. Some people, including individuals with dyslexia and cognitive disabilities, may experience difficulties with this. Thus, it is important to make textual information accessible to diverse target audiences. Automatic Text Simplification (ATS) techniques aim to reduce the linguistic complexity in texts to facilitate readability and comprehension. However, existing ATS systems often lack customization to specific user needs, and simplification data for languages other than English is limited. This thesis addressed ATS in a Swedish context, building upon novel methods that provide more control over the simplification generation process, enabling user customization. A dataset of Swedish paraphrases was mined from a large amount of text data. ATS models were then trained on this dataset utilizing prefix-tuning with control prefixes. Two sets of text attributes and their effects on performance were explored for controlling the generation. The first had been used in previous research, and the second was extracted in a data-driven way from existing text complexity measures. The trained ATS models for Swedish and additional models for English were evaluated and compared using SARI and BLEU metrics. The results for the English models were consistent with results from previous research using controllable generation mechanisms, although slightly lower. The Swedish models provided significant improvements over the baseline, in the form of a fine-tuned BART model, and compared to previous Swedish ATS results. These results highlight the efficiency of using paraphrase data paired with controllable generation mechanisms for simplification. Furthermore, the different sets of attributes provided very similar results, pointing to the fact that both these sets of attributes manage to capture aspects of simplification. The process of mining paraphrases, selecting control attributes and other methodological implications are discussed, leading to suggestions for future research.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)