Assessment Accuracy of a Large Language Model on Programming Assignments

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Douglas Bengtsson; Axel Kaliff; [2023]

Keywords: ;

Abstract: The education sector is changing rapidly and adopting new practices of managing student assignments. Manually assessing student work can be costly, as well as sometimes erroneous, implying it can be beneficial to automate the grading process. With new large language models rises questions on how well these can be used to grade assignments, as well as how their accuracy can be improved. This study has explored how task context affects the accuracy of a large language model when grading programming assignments. The large language model used in this study was OpenAI’s recently released GPT-4 model, and the student assignments were collected from an introductory programming course (DD1338) at KTH Royal Institute of Technology. In order to evaluate the grading accuracy, this study first had to inject errors into correct student assignments, which was also done with GPT-4. Four different logical error categories were used: looping, if-else, recursion, and off-by-one errors. When the large language model was provided with the instruction context, the results were sometimes inconsistent but indicated that task context negatively affects the assessment accuracy. In addition, the feedback provided in the assessment seem to hold a high accuracy level. Even though the feedback may become more accurate when the model is provided with the instruction context, this usually comes with fewer identified errors overall and therefore a smaller assessment accuracy. Several recommendations for future research are recommended, including investigating how other types of context impact the accuracy, or how well the large language model identifies other types of errors. Future studies might also investigate how grading the same file several times might affect the accuracy, due to the non-deterministic nature of a large language model such as GPT-4.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)