Forecasting Efficiency in Cryptocurrency Markets : A machine learning case study

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Financial time-series are not uncommon to research in an academic context. This is possibly not only due to its challenging nature with high levels of noise and non-stationary data, but because of the endless possibilities of features and problem formulations it creates. Consequently, problem formulations range from classification and categorical tasks determining directional movements in the market to regression problems forecasting their actual values. These tasks are investigated with features consisting of data extracted from Twitter feeds to movements from external markets and technical indicators developed by investors. Cryptocurrencies are known for being evermore so volatile and unpredictable, resulting in institutional investors avoiding the market. In contrast, research in academia often applies state-of-the-art machine learning models without the industry’s knowledge of pre-processing. This thesis aims to lessen the gap between industry and academia by presenting a process from feature extraction and selection to forecasting through machine learning. The task involves how well the market movements can be forecasted and the individual features’ role in the predictions for a six-hours ahead regression task. To investigate the problem statement, a set of technical indicators and a feature selection algorithm were implemented. The data was collected from the exchange FTX and consisted of hourly data from Solana, Bitcoin, and Ethereum. Then, the features selected from the feature selection were used to train and evaluate an Autoregressive Integrated Moving Average (ARIMA) model, Prophet, a Long Short-Term Memory (LSTM) and a Transformer on the spread between the spot price and three months futures market for Solana. The features’ relevance was evaluated by calculating their permutation importance. It was found that there are indications of short-term predictability of the market through several forecasting models. Furthermore, the LSTM and ARIMA-GARCH performed best in a scenario of low volatility, while the LSTM outperformed the other models in times of higher volatility. Moreover, the investigations show indications of non-stationary. This phenomenon was not only found in the data as sequence but also in the relations between the features. These results show the importance of feature selection for a time frame relevant to the prediction window. Finally, the data displays a strong mean-reverting behaviour and is therefore relatively well-approximated by a naive walk.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)