Automatic Invoice Data Extraction as a Constraint Satisfaction Problem

University essay from Uppsala universitet/Institutionen för informationsteknologi

Abstract: Invoice processing has traditionally been heavily dependent onmanual labor, where the task is to identify and move certaininformation from an origin to a destination. A time demandingtask with a high interest of automation to reduce time ofexecution, fault-risk and cost.With the evergrowing interest in automation and ArtificialIntelligence (AI), this thesis will explore the possibilities ofautomating the task of extracting and mapping information ofinterest by defining the problem as a Constraint OptimizationProblem (COP) using numeric relations between present information.The problem is then solved by extracting the numericalvalues in a document and utilizing it as an input space whereeach combination of numeric values are tested using a backendsolver.Several different models were defined, using different approachesand constraints on relations between possible existingfields. A solution to an invoice was considered correct if thetotal, tax, net and rounding amounts were estimated correctly.The final best achieved results were 84.30% correct and8.77% incorrect solutions on a set of 1400 various types of invoices.The achieved results show a promising alternative route toproposed solutions using e.g. machine learning or other intelligentsolutions using graphical or positional data. While only regardingthe numerical values present in each document, the proposedsolution becomes decentralized and therefor can be implementedand ran on any set of invoices without any pre-training phase.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)