Load Classification with Machine Learning : Classifying Loads in a Distribution Grid

University essay from Uppsala universitet/Institutionen för teknikvetenskaper

Abstract: This thesis explores the use of machine learning as a load classifier in a distribution grid based on the daily consumption behaviour of roughly 1600 loads spread throughout the areas Bromma, Hässelby and Vällingby in Stockholm, Sweden. Two common unsupervised learning methods were used for this, K-means clustering and hierarchical agglomerative clustering (HAC), the performance of which was analysed with different input data sets and parameters. K-means and HAC were unfortunately difficult to compare and there were also some difficulties in finding a suitable number of clusters K with the used input data. This issue was resolved by evaluating the clustering outcome with custom loss function MSE-tot that compared created clusters with subsequent assignment of new data. The loss function MSE-tot indicates that K-means is more suitable than HAC in this particular clustering setup. To investigate how the obtained clusters could be used in practice, two K-means clustering models were also used to perform some cluster-specific peak load predictions. These predictions were done using unitless load profiles created from the mean properties of each cluster and dimensioned using load specific parameters. The developed models had a mean relative error of approximately 8-19 % per load, depending on the prediction method and which of the two clustering models that was used. This result is quite promising, especially since deviations above 20 % were not uncommon in previous work. The models gave poor predictions for some clusters, however, which indicates that the models may not be suitable to use on all kinds of load data in its current form. One suggestion for how to further improve the predictions is to add more explanatory variables, for example the temperature dependence. The result of the developed models were also compared to the conventionally used Velander's formula, which makes predictions based on the loads' facility-type and annual electricity consumption. Velander's formula generally performed worse than the developed methods, only reaching a mean relative error of 40-43 % per load. One likely reason for this is that the used database had poor facility label quality, which is essential for obtaining correct constants in Velander's formula.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)