Estimating Prediction Intervals with Machine Learning and Monte Carlo Methods in Online Advertising

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Emil Häglund; [2020]

Keywords: ;

Abstract: Online advertising presents a complex environment. The vast amount of available websites, platforms and formats as well as the trend of programmatic adpurchasing makes assessing a proposed advertisement in terms of cost and expected return challenging. This paper uses machine learning to predict cost per thousand impressions (CPM), a measure of advertising efficiency, for a planned purchase. Feedforward neural networks and random forest models were compared on their ability to produce CPM point estimates and prediction intervals. To estimate prediction intervals, Monte Carlo dropout and noise estimation was employed for the neural network models. For random forest, a Monte Carlo approach where a large number of models were parameterized using bootstrap sampling was employed. Implemented algorithms were compared using the 5x2cv test. Random forest and neural network models produced similar point estimation accuracy. To obtain valid prediction intervals in terms of coverage probability for the random forest algorithm, hyperparameters had to be adjusted to increase the tree-level variance. This negatively affected the accuracy of point estimates and the random forest prediction intervals were less optimal than those produced by the neural network algorithm. This difference in performance was statistically supported by the 5x2cv test. It is concluded that both evaluated random forest and neural network algorithms for prediction intervals produced valid intervals although neural network estimates were more optimal.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)