Classifying Receipts and Invoices in Visma Mobile Scanner

University essay from Linnéuniversitetet/Institutionen för datavetenskap (DV)

Abstract: This paper presents a study on classifying receipts and invoices using Machine Learning. Furthermore, Naïve Bayes Algorithm and the advantages of using it will be discussed.  With information gathered from theory and previous research, I will show how to classify images into a receipt or an invoice. Also, it includes pre-processing images using a variety of pre-processing methods and text extraction using Optical Character Recognition (OCR). Moreover, the necessity of pre-processing images to reach a higher accuracy will be discussed. A result shows a comparison between Tesseract OCR engine and FineReader OCR engine. After embracing much knowledge from theory and discussion, the results showed that combining FineReader OCR engine and Machine Learning is increasing the accuracy of the image classification.

