Big Data Validation

University essay from Uppsala universitet/Informationssystem

Abstract: With the explosion in usage of big data, stakes are high for companies to develop workflows that translate the data into business value. Those data transformations are continuously updated and refined in order to meet the evolving business needs, and it is imperative to ensure that a new version of a workflow still produces the correct output. This study focuses on the validation of big data in a real-world scenario, and implements a validation tool that compares two databases that hold the results produced by different versions of a workflow in order to detect and prevent potential unwanted alterations, with row-based and column-based statistics being used to validate the two versions. The tool was shown to provide accurate results in test scenarios, providing leverage to companies that need to validate the outputs of the workflows. In addition, by automating this process, the risk of human error is eliminated, and it has the added benefit of improved speed compared to the more labour-intensive manual alternative. All this allows for a more agile way of performing updates on the data transformation workflows by improving on the turnaround time of the validation process.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)