Readability algorithms compability on multiple languages

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Robin Tillman; Ludvig Hagberg; [2014]

Keywords: ;

Abstract: This paper aims to test the compatibility of readability algorithms when using text written in dierent languages as parameters, the languages used is Swedish and English. A readability algorithm aims to approximate the readability of a text. Readability can be dened in many ways but the denition used in this paper is simply in which easea text can be read and understood. The tests conducted was done on the Swedish and English version of the same text, hence the readabilityis expected to be fairly alike. Three algorithms was tested, Coleman-Liauindex (CLI), Lasbarhetsindex (LIX) and Automated Readability Index(ARI). The texts used was a collection of Wikipedia articles, "On the Origin of Species" by Charles Darwin and the Bible and their respective translations. The main focus was put into the Wikipedia articles because of the amount of text they consisted of and "On the Origin of Species" due to the similar set of variables in both languages and due to their similar sentence structure. The tests showed that both ARI and LIX works for both Swedish and English on texts which by the the denitions of the formulas are less readable. CLI however seem to perform less well on these higher level texts, but works excellent on the Bible which by all where dened as easy to read. This leads to the conclusion that ARI and LIX work on hard and average texts in both English and Swedish and that CLI work only on easy to rad texts in both languages.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)