Code Generation for Accelerating Data Flow : Enhancing Pentaho Data Integration Performance

University essay from Umeå universitet/Institutionen för fysik

Author: Alexander Svensson; [2023]

Keywords: ;

Abstract: Pentaho Data Integration, called Kettle, is an ETL tool that functions as a no-code program. The tool, implemented in Java, enables users to create data flow structures via a graphical user interface and store them as XML files, which can be edited or executed. In some applications, the current execution method does not provide satisfactory performance. To speed up execution times, we propose a Java code generator that works by analyzing the existing XML setup and Kettle’s existing source code.We also conduct some exploratory work with Apache Hop, another Kettle-based ETL tool, and provide comparative insights.Our analysis demonstrates the potential for significant speed improvements, with times reduced by 60% or even more. We consider this method’s challenges and limitations and propose solutions to overcome them. Overall, our research contributes to the field of no-code programming by highlighting the potential for using code generation to optimize performance in data engineering processes.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)