Scalable Gaussian Process Regression for Time Series Modelling

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Machine learning algorithms has its applications in almost all areas of our daily lives. This is mainly due to its ability to learn complex patterns and insights from massive datasets. With the increase in the data at a high rate, it is becoming necessary that the algorithms are resource-efficient and scalable. Gaussian processes are one of the efficient techniques in non linear modelling, but has limited practical applications due to its computational complexity. This thesis studies how parallelism techniques can be applied to optimize performance of Gaussian process regression and empirically assesses parallel learning of a sequential GP and a distributed Gaussian Process Regression algorithm with Random Projection approximation implemented in SPARK framework. These techniques were tested on the dataset provided by Volvo Cars. From the experiments, it is shown that training the GP model with 45k records or 219 ≈106 data points takes less than 30 minutes on a spark cluster with 8 nodes. With sufficient computing resources these algorithms can handle arbitrarily large datasets.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)