Machine Learning for Predictive Maintenance on Wind Turbines : Using SCADA Data and the Apache Hadoop Ecosystem
Abstract: This thesis explores how to implement a predictive maintenance system for wind turbines in Apache Spark using SCADA data. How to balance and scale the data set is evaluated, together with the effects of applying the algorithms available in Spark mllib to the given problem. These algorithms include Multilayer Perceptron (MLP), Linear Regression (LR), Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM) and Gradient Boosted Tree (GBT). This thesis also evaluates the effects of applying stacking and bagging algorithms in an attempt to decrease the variance and improve the metrics of the model. It is found that the MLP produces the most promising model for predicting failures on the given data set and that stacking multiple MLP models is a good way of producing a model with a lower variance than the individual base models. In addition to this, a function that creates a savings estimation is developed. Using this function, a time window function that explores the decisiveness of a model is created. The conclusion is made that a model is more decisive if the failure it predicts occurs in a turbine where it has been trained on failure data from that same component, indicating that there are unknown variables that affect the sensor data.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)