Performance evaluation of Cassandra in AWS environment : An experiment

University essay from Blekinge Tekniska Högskola

Abstract: Context. In the field of computer science, the concept of cloud computing plays a prominent role which can be hosted on the internet to store, manage and also to process the data. Cloud platforms enables the users to perform large number of computing tasks across the remote servers. There exist several cloud platform providers like Amazon, Microsoft, Google, Oracle and IBM. Several conventional databases are available in cloud service providers in order to handle the data. Cassandra is a NoSQL database system which can handle the unstructured data and can scale large number of operations per second even across multiple datacentres. Objectives. In this study, the performance evaluation of NoSQL database in AWS cloud service provider has been performed. The performance evaluation of a three node Cassandra cluster is performed for different configuration of EC2 instances. This performance has been evaluated using the metrics throughput and CPU utilization. The main aim of this thesis was to evaluate the performance of Cassandra under various configurations with the YCSB benchmarking tool. Methods. A literature review has been conducted to gain more knowledge about the current research area. The metrics required to evaluate the performance of Cassandra were identified through literature study. The experiment was conducted to compute the results for throughput and CPU utilization under the different configurations t2.micro, t2.medium and t2.small for 3 node and 6 node cluster using YCSB benchmarking tool. Results. The results of the experiment include the metrics, throughput and CPU utilization which were identified in the literature review. The results calculated were plotted as graphs to compare their performance for three different configurations. The results obtained were segregated as two different scenarios which were for 3 node and 6 node clusters. Conclusions. Based on the obtained values of throughput the optimal or sub-optimal configuration of a data centre running multiple instances of Cassandra such that the specific throughput requirements are satisfied. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)