Verification of linear scalability of a business Big Data platform against the Queueing Networks model

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Magdalena Matczak; [2019]

Keywords: ;

Abstract: Ensuring that software built on top of distributed systems for Big Data has good scalability properties is crucial for the design of long-lasting and reliable products. The purpose of this Master Thesis is to investigate and characterize scalability of a business Big Data platform, URights, developed by IBM in cooperation with a French association, SACEM. This work focuses on the initial step in URights called Ingestion. Scalability is examined in the context of proportionally growing workloads and resources. Following the study, a set of recommendations for the platform is formulated. Applicability of different techniques of performance evaluation for assessment of scalability is examined. Three methods of evaluation are used in this work. First, a mathematical analysis based on Queueing Networks (QNs) is conducted. Then, a simulation engine extending the QN model is designed and a set of simulations is run. Finally, an empirical evaluation is conducted in a test environment. Due to the scarcity of data and stark differences between the test and production environments, the reliability of the empirical results is questionable. Mathematical analysis and simulations suggest that fine-grained parallelism is desirable for low and middle workloads. Also, if more data must be processed, it is better to increase the frequency of batches used in URights rather than their sizes. Finally, a potential bottleneck operation is identified. The use of different methods of evaluation allows us to progressively formulate and investigate more complicated questions as well as to observe the limits and benefits of each tool.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)