Federated Learning with FEDn for Financial Market Surveillance

University essay from Uppsala universitet/Avdelningen för beräkningsvetenskap

Abstract: Machine Learning (ML) is the current trend that most industries opt for to improve their business and operations. ML has also been adopted in the financial markets, where well-funded financial institutions employ the latest ML algorithms to gain an advantage on the market. The darker side of ML is the potential emergence of complex algorithmic trading schemes that are abusive and manipulative. Because of this, it is inevitable that ML will be applied to financial market surveillance in order to detect these abusive and manipulative trading strategies. Ideally, an accurate ML detection model would be developed with data from many financial institutions or trading venues. However, such ML models require vast quantities of data, which poses a problem in market surveillance where data is sensitive or limited. Data sharing between companies or countries is typically accompanied by legal and privacy concerns. By training ML models on distributed datasets, Federated Learning (FL) overcomes these issues by eliminating the need to centralise sensitive data. This thesis aimed to address these ML related issues in market surveillance by implementing and evaluating a FL model. FL enables a group of independent data-holding clients with the same intention to build a shared ML model collaboratively without compromising private data. In this work, a ML model is initially deployed in a centralised data setting and trained to detect the manipulative trading scheme known as spoofing. The LSTM-Autoencoder was the model chosen method for this task. The same model is also implemented in a federated setting but with decentralised data, using the FL framework FEDn. Another FL framework, Flower, is also employed to evaluate the performance of FEDn. Experiments were conducted comparing the FL models to the conventional centralised learning model, as well as comparing the two frameworks to each other. The results showed that under certain circumstances, the FL models performed better than the centralised model in detecting spoofing. FEDn was equivalent to Flower in terms of detection performance. In addition, the results indicated that Flower was marginally faster than FEDn. It is assumed that variations in the experimental setup and stochasticity account for the performance disparity.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)