GPT-4 as an Automatic Grader : The accuracy of grades set by GPT-4 on introductory programming assignments

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Filippa Nilsson; Jonatan Tuvstedt; [2023]

Keywords: ;

Abstract: Education is a field with a lot of time consuming tasks outside the core charge of teaching. One of these arduous tasks is grading, which can be monotonous and very time consuming. An emerging field that could potentially alleviate this is Artificial Intelligence (AI), or more specifically, Large Language Models (LLM:s), that have advanced immensely in the last year following the release of ChatGPT. This thesis investigates the accuracy of grading by GPT-4 compared to Teachers Assistants on introductory programming assignments. The work of a total of 73 students in the introductory programming courses INDA at KTH Royal Institute of Technology was examined by GPT and graded. The grading was accomplished by sending a prompt to GPT, consisting of plain text copies of the assignment, the grading criteria, the student submission and instructions on how to grade. The results were very promising, with GPT having an overall accuracy of 75% when compared to grades by Teachers Assistants. However, it was significantly worse at correctly identifying submissions that failed compared to those that passed. The results indicated that AI could be able to grade students’ work reliably in the future, if the development of LLM:s continue progressing. In the meantime, AI can be used as a grading tool for educators around the world, alleviating their workload.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)