Comparison of graph databases and relational databases performance

University essay from Stockholms universitet/Institutionen för data- och systemvetenskap

Abstract: There has been a change of paradigm in which way information is being produced, processed, and consumed as a result of social media. While planning to store the data, it is important to choose a suitable database for the type of data, as unsuitable storage and analysis can have a noticeable impact on the system’s energy consumption. Additionally, effectively analyzing data is essential because deficient data analysis on a large dataset can lead to repercussions due to unsound decisions and inadequate planning. In recent years, an increasing amount of organizations have provided services that cannot be anymore achieved efficiently using relational databases. An alternative data storage is graph databases, which is a powerful solution for storing and searching for relationship-dense data. The research question that the thesis aims to answer is, how do state-of-the-art-graph database and relational database technologies compare with each other from a performance perspective in terms of time taken to query, CPU usage, memory usage, power usage, and temperature of the server? To answer the research question, an experimental study using analysis of variance will be performed. One relational database, MySQL, and two graph databases, ArangoDB and Neo4j, will be compared using a benchmark. The benchmark used is Novabench. The results from the post-hoc, KruskalWallis, and analysis of variances show that there are significant differences between the database technologies. This means the null hypothesis, that there is no significant difference, is rejected, and the alternative hypothesis, that there is a significant difference in performance between the database technologies in the aspects of Time to Query, Central Processing Unit usage, Memory usage, Average Energy usage, and temperature holds. In conclusion, the research question was answered. The study shows that Neo4j was the fastest at executing queries, followed by MySQL, and in last place ArangoDB. The results also showed that MySQL was more demanding on memory usage than the other database technologies.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)