Scalable Persisting and Querying of Streaming Data by Utilizing a NoSQL Data Store

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Khalid Mahmood; [2014]

Keywords: ;

Abstract: Relational databases provide technology for scalable queries over persistent data. In many application scenarios a problem with conventional relational database technology is that loading large data logs produced at high rates into a database management system (DBMS) may not be fast enough, because of the high cost of indexing and converting data during loading. As an alternative a modern indexed parallel NoSQL data store, such as MongoDB, can be utilized. In this work, MongoDB was investigated for the performance of loading, indexing, and analyzing data logs of sensor readings. To investigate the trade-offs with the approach compared torelational database technology, a benchmark of log files from an industrial application was used for performance evaluation. For scalable query performance indexing is required. The evaluation covers both the loading time for the log files and the execution time of basic queries over loaded log data with and without indexes. As a comparison, we investigated the performance of using a popular open source relational DBMS and a DBMS from a major commercial vendor. The implementation, called AMI (Amos Mongo Interface), provides an interface between MongoDB and an extensible main-memory DBMS, Amos II, where different kinds of back-end storagemanagers and DBMSs can be interfaced. AMI enables general on-line analyzes through queries of data streams persisted in MongoDB as a back-end data store. It furthermore enables integration of NoSQL and SQL databases through queries to Amos II. The performance investigation used AMI to analyze the performance of MongoDB, while the relational DBMSs were analyzed by utilizing the existing relational DBMS interfaces of Amos II.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)