Compressibility as Proxy for Readability

University essay from Mittuniversitetet/Institutionen för data- och systemvetenskap

Abstract: This study’s main objective is to examine if there is acorrelation between readability and compressibility of Java code. The code readability is important to softwaremaintainability and the comprehension of the code, and thiscan be verified and tested with a range of different metricssuch as B&W, Scalabrino and Dorn’s readability metric.Should there exist a correlation, compressibility could proveto be a simple yet useful readability metric.Data compression is when code or data is encoded usingfewer bits that its original size. There are several algorithmsto do this, and this study works with some of the mostpopular methods. To examine the correlation, we first testedthe different compression algorithms against each other tosee if there was a major difference in size of the resulting file.After that we compared the compressibility between twodifferent types of written code, with previously establisheddifferences in readability.All in all, the source code from a total of 20 popular GitHubprojects were tested with 3 compression algorithms tocompare the differences between the algorithms. For thecompressibility comparisons between code as relating toreadability, a combined total of 104 code snippets weretested, 52 of each compared coding paradigm.Result: For the first test we concluded that there was nosignificant difference between the compression rates of thealgorithms, ending up roughly within 4% or less of eachother on average.The second result reveals a small difference incompressibility between sets of code using reactive Java andobject-oriented Java. These two paradigms are showing adifference in readability according to earlier research, thoughthe difference in compressibility was so small that it wasconsidered negligible. This is due to a lack of variety ofsnippets tested and the difference can largely be attributed tothe small file sizes of some snippets. The smaller filesincreased in size due to the compression adding an“overhead” when a file is compressed. This is morenoticeable on smaller files which this study tested a lot of.In conclusion, the study was unable to indicate a clearconnection between source code readability andcompressibility. Thus, it does not indicate that compressibility is a suitable proxy for readability as of now.This study does however start a conversation on a topicpreviously untouched, and we hope that this study can pointother studies in the right direction. The scope of this researchis too big to be fully explored in this study alone, and westrongly suggest future research on the topic.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)