MULTI-CLASS GRAMMATICAL ERROR DETECTION Data, Benchmarks and Evaluation Metrics for the First Shared Task on Swedish L2 Data

University essay from Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori

Abstract: Grammatical Error Detection (GED) is a challenging NLP task that has not received a lot of research attention in the recent years, especially in the Swedish language. However, in the world we live in, where there are more L2 (second language) learners than there have ever been, educational resources for students such as tools for grammar checking are needed. With this in mind, this Master’s thesis presents the generation process of the Swedish MuClaGED (Multi-Class Grammatical Error Detection) dataset, which is going to be part of a Computational SLA (Second Language Acquisition) shared task and it will likely be useful for the future production of multilingual grammatical error detection systems. Once Swedish MuClaGED is obtained in this thesis, two main experiments are performed on it to test its capabilities and obtain baseline results in preparation for the aforementioned shared task. Moreover, this project also aims to tackle and explore the advantages, disadvantages and functionalities of the creation of hybrid error detection datasets by experimenting on producing GED models trained on the combination of original L2 learners’ data with text corrupted with artificially generated syntactical errors.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)