Performance Evaluation of Cassandra in a Virtualized Environment

University essay from Blekinge Tekniska Högskola/Institutionen för datalogi och datorsystemteknik

Abstract: Context. Apache Cassandra is an open-source, scalable, NoSQL database that distributes the data over many commodity servers. It provides no single point of failure by copying and storing the data in different locations. Cassandra uses a ring design rather than the traditional master-slave design. Virtualization is the technique using which physical resources of a machine are divided and utilized by various virtual machines. It is the fundamental technology, which allows cloud computing to provide resource sharing among the users.  Objectives. Through this research, the effects of virtualization on Cassandra are observed by comparing the virtual machine arrangement to physical machine arrangement along with the overhead caused by virtualization.  Methods. An experiment is conducted in this study to identify the aforementioned effects of virtualization on Cassandra compared to the physical machines. Cassandra runs on physical machines with Ubuntu 14.04 LTS arranged in a multi node cluster. Results are obtained by executing the mixed, read only and write only operations in the Cassandra stress tool on the data populated in this cluster. This procedure is repeated for 100% and 66% workload. The same procedure is repeated in virtual machines cluster and the results are compared.  Results. Virtualization overhead has been identified in terms of CPU utilization and the effects of virtualization on Cassandra are found out in terms of Disk utilization, throughput and latency.  Conclusions. The overhead caused due to virtualization is observed and the effect of this overhead on the performance of Cassandra has been identified. The consequence of the virtualization overhead has been related to the change in performance of Cassandra.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)