Toward an application of machine learning for predicting foreign trade in services – a pilot study for Statistics Sweden

University essay from Stockholms universitet/Statistiska institutionen

Abstract: The objective of this thesis is to investigate the possibility of using machine learn- ing at Statistics Sweden within the Foreign Trade in Services (FTS) statistic, to predict the likelihood of a unit to conduct foreign trade in services. The FTS survey is a sample survey, for which there is no natural frame to sample from. Therefore, prior to sampling a frame is manually constructed each year, starting with a register of all Swedish companies and agencies and in a rule- based manner narrowing it down to contain only what is classified as units likely to trade in services during the year to come. An automatic procedure that would enable reliable predictions is requested. To this end, three different machine learning methods have been analyzed, two rule- based methods (random forest and extreme gradient boosting) and one distance- based method (k nearest neighbors). The models arising from these methods are trained and tested on historically sampled units, for which it is known whether they did trade or not. The results indicate that the two rule-based methods perform well in classifying likely traders. The random forest model is better at finding traders, while the extreme gradient boosting model is better at finding non-traders. The results also indicate interesting patterns when studying different metrics for the models. The results also indicate that when training the rule-based models, the year in which the training data was sampled needs to be taken into account. This entails that cross-validation with random folds should not be used, but rather grouped cross-validation based on year. By including a feature that mirror the state of the economy, the model can adapt its rules to this, meaning that the rules learned on training data can be extended to years beyond training data. Based on the observed results, the final recommendation is to further develop and investigate the performance of the random forest model.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)