Extracting information about arms deals from news articles

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: The Stockholm International Peace Research Institute (SIPRI) maintains the most comprehensive publicly available database on international arms deals. Updating this database requires humans to sift through large amounts of news articles, only some of which contain information relevant to the database. To save time, it would be useful to automate a part of this process. In this thesis project we apply ALBERT, a state of the art Pre-trained Language Model for Natural Language Processing (NLP), to the task of determining if a text contains information about arms transfers and extracting that information. In order to train and evaluate the model we also introduce a new dataset of 600 news articles, where information about arms deals is annotated with lables such as Weapon, Buyer, Seller, etc. We achieve an F1-score of 0.81 on the task of determining if an arms deal is present in a text, and an F1-score of 0.77 on determining if a given part of a text has a specific arms deal-related attribute. This is probably not enough to entirely automate SIPRI’s process, but it demonstrates that the approach is feasible. While this paper focuses specifically on arms deals, the methods used can be generalized to extracting other kinds of information. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)