Building Data Classification and Association

University essay from Lunds universitet/Institutionen för reglerteknik

Abstract: Almost half of the energy consumption in the EU originates from heating and cooling of buildings. The European Commission states that smart control of building systems may reduce the energy consumption. Cloud based smart control or even advanced fault-detection systems are becoming more common and should work for any building in the world. These systems need to receive data from the physical buildings which are commonly managed by a Building Management System, BMS. Today, when connecting an advanced control or analysis system to a buildings’ BMS is a manual process, more or less, which is time consuming and error prone. Therefore, it would be beneficial if this process could be automated. This thesis aimed to find machine learning methods that had the potential to be used to fullyor semi-automate the connection process. By implementing and evaluating models of three machine learning methods, random forest, gradient boosting and neural network, we aimed to find some method able of labelling time series data into a fixed classification system with a precision of 80% or higher. The solutions were tested on three data sets with different complexity and we could show that for a set with low complexity it is possible to achieve perfect classification, i.e. accuracy of 100%. For the more complex sets accuracy decreased to roughly 60% and a fully automated solution from these models would not perform good enough. However, the probability that the correct class was among the top five predictions of the models remained high and therefore they could be used in a semi-automated connection process. Overfitting was an extensive problem when classifying signals, especially for random forest and gradient boosting models. We believe this is partly due to the data being too homogeneous and the situation could be improved by including data from additional buildings. The problems with overfitting could be seen most clearly when models were trained and tested on data from different buildings. In this case, random forest and gradient boosting models were clearly outperformed by neural network models that still scored about 60% accuracy without any overfitting. We also attempted to group signals by equipment type. This was done via support vector machines and a string comparison method. The support vector machine solution was only possible to deploy on the least complex data set, but performed well with an accuracy of over 85%. To implement this solution on more complex data sets more knowledge about the system is needed. The string comparison method proved that much information could be gathered from the correlations in the signal names and paths. Nevertheless, it was hard to come to any general conclusions from this since data from only one BMS was used.We believe that the string comparison could give good results in combination with other methods.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)