Low Contrast Receipt Scanning Using Deep Learning and Computer Vision

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Noa Spele; [2022]

Keywords: ;

Abstract: Although financial bookkeeping is commonly performed online physical receipts are still widely used. Physical receipts are particularly problematic for companies with a large number of employees. Every time an employee makes a purchase on behalf of the company they will have to report the expense to the employer, and since the proof of the expense is generally presented on a physical receipt the information needs to be extracted from it. To extract the necessary details from the receipts manual labor is needed, either from the employee or the employer. Manual transcription is often both time consuming and error prone which leads to increased costs for the companies. Therefore, methods to automatically extract the information from images of receipts have been developed. These methods usually consists of multiple steps were the first step is to find the location of the receipt in the image. However, existing methods for identifying the location of the receipt performs poorly in cases where the images have low contrast. In this thesis 4 different modified versions of the U-net model are applied to the problem of finding the location of receipts in low contrast images taken by modern smartphones. The location of the receipt is both represented as a segmentation mask and as a quadrilateral formed by 4 approximated corner points. Two of the methods presented are intended for use on modern smartphone hardware, since employers often provide employees with phones. The results shows that the problem can be solved sufficiently using the different U-net models and the different ways of representing the location of the receipt. The regular U-net model achieved an average accuracy of 98.6744 ± 2.3819 when representing the location as a segmentation mask and achieved an average Intersection over Union (IoU) of 0.8818 ± 0.0750 when representing the location as a quadrilateral. The model intended for mobile use achieved an average accuracy of 97.5041 ± 3.3664 when representing the location as a segmentation mask and achieved an average IoU of 0.8602 ± 0.0841 when representing the location as a quadrilateral. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)