Performance Evaluation of Cassandra Scalability on Amazon EC2

University essay from Blekinge Tekniska Högskola/Institutionen för datalogi och datorsystemteknik

Abstract: Context In the fields of communication systems and computer science, Infrastructure as a Service consists of building blocks for cloud computing and to provide robust network features. AWS is one such infrastructure as a service which provides several services out of which Elastic Cloud Compute (EC2) is used to deploy virtual machines across several data centers and provides fault tolerant storage for applications across the cloud. Apache Cassandra is one of the many NoSQL databases which provides fault tolerance and elasticity across the servers. It has a ring structure which helps the communication effective between the nodes in a cluster. Cassandra is robust which means that there will not be a down-time when adding new Cassandra nodes to the existing cluster.  Objectives. In this study quantifying the latency in adding Cassandra nodes to the Amazon EC2 instances and assessing the impact of Replication factors (RF) and Consistency Levels (CL) on autoscaling have been put forth. Methods. Primarily a literature review is conducted on how the experiment with the above-mentioned constraints can be carried out. Further an experimentation is conducted to address the latency and the effects of autoscaling. A 3-node Cassandra cluster runs on Amazon EC2 with Ubuntu 14.04 LTS as the operating system. A threshold value is identified for each Cassandra specific configuration and is scaled over to five nodes on AWS utilizing the benchmarking tool, Cassandra stress tool. This procedure is repeated for a 5-node Cassandra cluster and each of the configurations with a mixed workload of equal reads and writes. Results. Latency has been identified in adding Cassandra nodes on Amazon EC2 instances and the impacts of replication factors and consistency levels on autoscaling have been quantified. Conclusions. It is concluded that there is a decrease in latency after autoscaling for all the configurations of Cassandra and changing the replication factors and consistency levels have also resulted in performance change of Cassandra.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)