Big Data Archivingwith Splunk and Hadoop

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Emre Berge Ergenekon; Petter Eriksson; [2013]

Keywords: ;

Abstract: Splunk is a software that handles large amounts of data every day. With data constantly growing, there is a need to phase out old data to keep the software from running slow. However, some of Splunk’s customers have retention policies that require the data to be stored longer than Splunk can offer. This thesis investigates how to create a solution for archiving large amounts of data. We present the problems with archiving data, the properties of the data we are archiving and the types of file systems suitable for archiving. By carefully considering data safety, reliability and using the Apache Hadoop project to support multiple distributed file systems, we create a flexible, reliable and scalable archiving solution.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)