Ability Estimation Methods : An Introduction to Item Response Theory and Elo Education Systems

University essay from Stockholms universitet/Statistiska institutionen

Author: Sebastian Hedberg; Salma Nasra; [2023]

Keywords: ;

Abstract: The foundational testing of knowledge and ability is and has always been very important. These days we can create a computerized test that can vary based on the estimated skill of the examinee. This is called an adaptive test. And the aim of this is to estimate the ability of the examinee more precisely.   In this thesis, we explain and examine two ability estimation methods based on test results. We introduce the reader to these methods and compare the two using real test data. The first method is the Rasch model from item response theory. And the second method is a version of the Elo ranking system manipulated for a testing or educational context.  Item response theory is a mathematical theory that utilizes different likelihood methods to estimate both difficulties of items and the ability of an examinee based on test results. Elo ranking system was initially created for ranking players in chess tournaments. The version in this paper uses a heuristic approach toward the ability and views each attempt on a question as a match. The outcome is then used to update the examinee’s ability rank continuously. We did this comparison using binary test-result data from three different tests. The comparison of these methods is made to investigate if there is a possibility to use the simpler Elo system as a substitute for IRT within computerized testing. We also wanted to analyze which method showed the best fit over several data sets.  Our results showed that ability estimations made by IRT and Elo, initially grew closer when comparing estimations over an increasing number of questions. But after around 20, the estimations started to drift apart again. The IRT model also showed a better fit overall, but we did see an improving fit for Elo based on an increasing number of questions. These results tell us that IRT is the superior method, but that more advanced Elo-based methods could be of interest to future studies.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)