An Entropy Estimate of Written Language and Twitter Language : A Comparison between English and Swedish

University essay from Linnéuniversitetet/Institutionen för matematik (MA)

Abstract: The purpose of this study is to estimate and compare the entropy and redundancy of written English and Swedish. We also investigate and compare the entropy and redundancy of Twitter language. This is done by extracting n consecutive characters called n-grams and calculating their frequencies. No precise values are obtained, due to the amount of text being finite, while the entropy is estimated for text length tending towards infinity. However we do obtain results for n = 1,...,6  and the results show that written Swedish has higher entropy than written English and that the redundancy is lower for Swedish language. When comparing Twitter with the standard languages we find that for Twitter, the entropy is higher and the redundancy is lower.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)