Data Fusion for Consumer Behaviour

University essay from KTH/Matematisk statistik

Author: Goran Dizdarevic; [2017]

Keywords: ;

Abstract: This thesis analyses different methods of data fusion by fitting a chosen number of statistical models to empirical consumer data and evaluating their performance in terms of a selection of performance measures. The main purpose of the models is to predict business related consumer variables. Conventional methods such as decision trees, linear model and K-nearest neighbor have been suggested as well as single-layered neural networks and the naive Bayesian classifier. Furthermore, ensemble methods for both classification and regression have been investigated by minimizing the cross-entropy and RMSE of predicted outcomes using the iterative non-linear BFGS optimization algorithm. Time consumption of the models and methods for feature selection are also discussed in this thesis. Data regarding consumer drinking habits, transaction and purchase history and social demographic background is provided by Nepa. Evaluation of the performance measures indicate that the naive Bayesian classifier predicts consumer drinking habits most accurately whereas the random forest, although the most time consuming, is preferred when classifying the Consumer Satisfaction Index (CSI). Regression of CSI yield similar performance to all models. Moreover, the ensemble methods increased the prediction accuracy slightly in addition to increasing the time consumption. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)