Performance Evaluation of Time series Databases based on Energy Consumption

University essay from Blekinge Tekniska Högskola/Institutionen för kommunikationssystem

Abstract: The vision of the future Internet of Things is posing new challenges due to gigabytes of data being generated everyday by millions of sensors, actuators, RFID tags, and other devices. As the volume of data is growing dramatically, so is the demand for performance enhancement. When it comes to this big data problem, much attention has been given to cloud computing and virtualization for their almost unlimited resource capacity, flexible resource allocation and management, and distributed processing ability that promise high scalability and availability. On the other hand, the variety of types and nature of data is continuously increasing. Almost without exception, data centers supporting cloud based services are monitored for performance and security and the resulting monitoring data needs to be stored somewhere. Similarly, billions of sensors that are scattered throughout the world are pumping out huge amount of data, which is handled by a database. Typically, the monitoring data consists time series, that is numbers indexed by time. To handle this type of time series data a distributed time series database is needed.   Nowadays, many database systems are available but it is difficult to use them for storing and managing large volumes of time series data. Monitoring large amounts of periodic data would be better done using a database optimized for storing time series data. The traditional and dominant relational database systems have been questioned whether they can still be the best choice for current systems with all the new requirements. Choosing an appropriate database for storing huge amounts of time series data is not trivial as one must take into account different aspects such as manageability, scalability and extensibility. During the last years NoSQL databases have been developed to address the needs of tremendous performance, reliability and horizontal scalability. NoSQL time series databases (TSDBs) have risen to combine valuable NoSQL properties with characteristics of time series data from a variety of use-cases.   In the same way that performance has been central to systems evaluation, energy-efficiency is quickly growing in importance for minimizing IT costs. In this thesis, we compared the performance of two NoSQL distributed time series databases, OpenTSDB and InfluxDB, based on the energy consumed by them in different scenarios, using the same set of machines and the same data. We evaluated the amount of energy consumed by each database on single host and multiple hosts, as the databases compared are distributed time series databases. Individual analysis and comparative analysis is done between the databases. In this report we present the results of this study and the performance of these databases based on energy consumption.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)