Automatic Distractor Generation for Spanish Reading Comprehension Questions : Using language models to generate wrong, but plausible answers for multiple choice questions in Spanish

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: A common evaluation method for students in the context of reading comprehension is the use of Multiple Choice Questions. A student must read a text and a question, and then choose the correct answer from a set of options, one of which one is the correct answer, and the other options are wrong. The wrong options are called distractors. Creating Multiple Choice Question exams is time-consuming, and a task that is open for automation. Distractor Generation is the task of generating wrong, but plausible options for Multiple Choice Questions. It is a task that can be addressed with Machine Learning and Large Language Models. As this task has been addressed in languages such as English, and Swedish, this work addresses the task for the Spanish language. This work achieves 3 objectives. The first one is the creation of a Multiple Choice Question dataset in Spanish with distractors, by manually tagging distractors from the dataset SQuAD-es. The newly created dataset with distractors is called SQuAD-es-dist. The second one is automatically generating distractors with machine learning methods. A BERT model is fine-tuned to generate distractors, and a GPT model is used through zero-shot learning to generate distractors. The third one is a human study on the generated distractors to evaluate the plausibility and usability of the distractors. Although both methods show to be effective, yet not perfect, at generating distractors, the GPT model shows better applicability and a higher capacity to confuse students in the task of Distractor Generation.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)