Computer Vision for Document Image Analysis and Text Extraction

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Omar Benchekroun; [2022]

Keywords: Optical Character Recognition; Document Analysis; Text Extraction; Transformers; Convolutional Neural Networks; Optisk teckenigenkänning; dokumentanalys; textutvinning; transformatorer; konvolutionella neurala nätverk;

Abstract: Automatic document processing has been a subject of interest in the industry for the past few years, especially with the recent technological advances in Machine Learning and Computer Vision. This project investigates in-depth a major component used in Document Image Processing known as Optical Character Recognition (OCR). First, an improvement upon existing shallow CNN+LSTM is proposed, using domain-specific data synthesis. We demonstrate that this model can achieve an accuracy of up to 97% on non-handwritten text, with an accuracy improvement of 24% when using synthetic data. Furthermore, we deal with handwritten text that presents more challenges including the variance of writing style, slanting, and character ambiguity. A CNN+Transformer architecture is validated to recognize handwriting extracted from real-world insurance statements data. This model achieves a maximal accuracy of 92% on real-world data. Moreover, we demonstrate how a data pipeline relying on synthetic data can be a scalable and affordable solution for modern OCR needs.

AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)

Computer Vision for Document Image Analysis and Text Extraction

Searchphrases right now

Popular searches

popular essays yesterday (2024-04-26)