Benchmarking of Data Mining Techniques as Applied to Power System Analysis

University essay from Institutionen för informationsteknologi

Author: Can Anil; [2013]

Keywords: ;

Abstract: The field of electric power systems is currently facing explosive growth in the amount of data. Since extracting useful information from this enormous amount of data is highly complex, costly, and time consuming, data mining can play a key role. In particular, the standard data mining algorithms for the analysis of huge data volumes can be parallelized for faster processing. This thesis focuses on benchmarking of parallel processing platforms; it employs data parallelization using Apache Hadoop cluster (MapReduce paradigm) and shared-memory parallelization using multi-cores on a single machine. As a starting point, we conduct real-time experiments in order to evaluate the efficacy of these two parallel processing platforms in terms of performance, resource usage (Memory), efficiency (including speed-up), accuracy, and scalability. The end result shows that the data mining methods can indeed be implemented as efficient parallel processes, and can be used to obtain useful resultsf rom huge amount of data in a case study scenario. Overall, we establish that parallelization using Apache Hadoop cluster is a promising model for scalable performance compared with the alternative suitable parallelization using multi-cores on a single machine

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)