UpdateCommander: A Library for Incremental Historical Updates for Complex Data

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Oskar Åsbrink; [2023]

Keywords: ;

Abstract: A common goal in complex data transformation applications is the reduction of required computation. This is especially true for applications with expensive computations and frequent updates to the transformation model. This thesis demonstrates UpdateCommander, an open-source library developed for use in data engineering pipelines to optimize recomputation of data when modifications are made to the transformation process. Traditional methods may require full recomputation of past data when changes are made to the parsing and transformation process, costing time and resources. UpdateCommander uses an Incremental Computing approach that enables the identification of subsets of data that are affected by changes to the parsing and transformation process. Thus, the UpdateCommander can save time and resources by reducing the amount of recomputations needed. The library was tested using serialized dummy data, and showed a reduction in computation time for most applications. For the most common applications the reduction ranged from 58.5-95.6% for changes affecting 11% of the total dataset. In the most complex case presented there was no reduction, but instead a slight increase of 6.7%. For the most common application, the use of UpdateCommander is no longer beneficial when the affected subset of data approaches 34%.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)