Federated Learning for Natural Language Processing using Transformers

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: The use of Machine Learning (ML) in business has increased significantly over the past years. Creating high quality and robust models requires a lot of data, which is at times infeasible to obtain. As more people are becoming concerned about their data being misused, data privacy is increasingly strengthened. In 2018, the General Data Protection Regulation (GDPR), was announced within the EU. Models that use either sensitive or personal data to train need to obtain that data in accordance with the regulatory rules, such as GDPR. One other data related issue is that enterprises who wish to collaborate on model building face problems when it requires them to share their private corporate data [36, 38]. In this thesis we will investigate how one might overcome the issue of directly accessing private data when training ML models by employing Federated Learning (FL) [38]. The concept of FL is to allow several silos, i.e. separate parties, to train models with the same objective, using their local data and then with the learned model parameters create a central model. The objective of the central model is to obtain the information learned by the separate models, without ever accessing the raw data itself. This is achieved by averaging the separate models’ weights into the central model. FL thus facilitates opportunities to train a model on large amounts of data from several sources, without the need of having access to the data itself. If one can create a model with this methodology, that is not significantly worse than a model trained on the raw data, then positive effects such as strengthened data privacy, cross-enterprise collaboration and more could be attainable. In this work we have used a financial data set consisting of 25242 equity research reports, provided by Skandinaviska Enskilda Banken (SEB). Each report has a recommendation label, either Buy, Sell or Hold, making this a multi-class classification problem. To evaluate the feasibility of FL we fine-tune the pre-trained Transformer model AlbertForSequenceClassification [37] on the classification task. We create one baseline model using the entire data set and an FL model with different experimental settings, for which the data is distributed both uniformly and non-uniformly. The baseline model is used to benchmark the FL model. Our results indicate that the best FL setting only suffers a small reduction in performance. The baseline model achieves an accuracy of 83.5% compared to 82.8% for the best FL model setting. Further, we find that with an increased number of clients, the performance is worsened. We also found that our FL model was not sensitive to non-uniform data distributions. All in all, we show that FL results in slightly worse generalisation compared to the baseline model, while strongly improving on data privacy, as the central model never accesses the clients’ data. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)