Performance assessment of Apache Spark applications

University essay from Linnéuniversitetet/Institutionen för datavetenskap och medieteknik (DM)

Abstract: This thesis addresses the challenges of large software and data-intensive systems. We will discuss a Big Data software that consists of quite a bit of Linux configuration, some Scala coding and a set of frameworks that work together to achieve the smooth performance of the system. Moreover, the thesis focuses on the Apache Spark framework and the challenging of measuring the lazy evaluation of the transformation operations of Spark. Investigating the challenges are essential for the performance engineers to increase their ability to study how the system behaves and take decisions in early design iteration. Thus, we made some experiments and measurements to achieve this goal. In addition to that, and after analyzing the result we could create a formula that will be useful for the engineers to predict the performance of the system in production.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)