Data streaming provenance in advanced metering infrastructures

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: Increasing volumes of data in digital systems have made the traditional approach of gathering and storing all the data while analyzing it in bulks at periodic intervals challenging and costly. One such field is the electric grid market, which has started modernizing its aging grids into smart grids where Advanced Metering Infrastructures (AMIs) play a vital role. Within AMIs, old meters are replaced with smart meters that are able to collect data more often and measure more properties than the old meters. However, they also produce a higher volume of data. Stream processing where data is analyzed continuously before being stored, can therefore be of interest as data can be heavily reduced before storage. The downside of this approach is that the traceability of data is lost. A technique that can solve this is called stream provenance which can be used to get the source data that contributed to the output data from a stream processing application. However, stream provenance is an understudied problem that can decrease performance when used. The purpose of this thesis is to study stream provenance by developing a streaming application that makes use of provenance. The application is evaluated by measuring several metrics to determine how performance is affected. The project is conducted at Göteborg Energi (GE), one of Sweden’s biggest energy utility companies. The objective is to develop a prototype extension to GE’s current stream-processing application that can detect faulty meters and use stream provenance to report them. The development processes and evaluation of the application are covered in this report. The application is developed through a Stream Processing Engine (SPE) called Apache Flink and a stream provenance framework called Ananke. Two versions are created, one with provenance and another without. Performance metrics like CPU utilization, memory consumption, latency, and throughput are measured. The result showed that provenance decreases throughput by 10.4% and increases memory consumption by 8.8%, latency by 10.4%, and CPU utilization by 238.1%. Several reasons behind the result are discussed in the report, along with the implications it can have for an application. Although there is an added overhead with provenance, it can still be beneficial for some types of applications. For example, an application where time is not crucial and good access to resources is possible, like the one developed in this thesis.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)