Automating the extraction of Financial data

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Nicolas Rollino; Rakin Ali; [2022]

Keywords: Web scraper; Financial data; Textract; AWS; Node.JS; Puppeteer;

Abstract: It is hard for retail investors and data providing companies to attain financial data of European companies. The work of extracting financial data of European companies is most likely done manually, which is a time-consuming process. This would explain why European companies’ data is supplied slower than American companies. This thesis attempts to see if it is possible to automatise the process of extracting financial data of European companies by creating two proof of concept systems. One focuses on collecting financial reports of European companies using a web scraper and directly scrapes the reports from the source. The other system extracts financial data from the reports using Amazon Web Services(AWS), specifically the text extraction tool called Textract. The system that collects financial reports from companies could not be automated and did not meet the expectations set by the company that commissioned the thesis. The system that extracts financial data from companies was promising as all data points of interest could be extracted. The second system was deemed promising however since it is reliant on a system that supplies it with reports, it cannot be implemented.The work conducted shows that automating the process of extracting financial data from European companies is not (yet) possible. Extracting the data from reports is possible however collecting the report is the bottleneck which is not possible. It would have been better to manually collect financial reports instead of using a web scraper in this thesis. This was a bottleneck which could be solved in future projects.

AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)

Automating the extraction of Financial data

Searchphrases right now

Popular searches

popular essays yesterday (2024-04-26)