Analysis and comparison of interfacing, data generation and workload implementation in BigDataBench 4.0 and Intel HiBench 7.0

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: One of the major challenges in Big Data is the accurate and meaningful assessment of system performance. Unlike other systems, minor differences in efficiency can escalate to large differences in costs and power consumption. While there are several tools on the marketplace for measuring the performance of Big Data systems, few of them have been explored in-depth. This report investigated the interfacing, data generation and workload implementations of two Big Data benchmarking suites, BigDataBench and Hibench. The purpose of the study was to establish the capabilities of each tool with regards to interfacing, data generation and workload implementation. An exploratory and qualitative approach was used to gather information and analyze each benchmarking tool. Source code, documentation, and reports published by the developers were used as information sources. The results showed that BigDataBench and HiBench were designed similarly with regards to interfacing and data flow during the execution of a workload with the exception of streaming workloads. BigDataBench provided for more realistic data generation while the data generation for HiBench was easier to control. With regards to workload design, the workloads in BigDataBench were designed to be applicable to multiple frameworks while the workloads in HiBench were focused on the Hadoop family. In conclusion, neither of benchmarking suites was superior to the other. They were both designed for different purposes and should be applied on a case-by-case basis.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)