Anomaly Detection and Revenue Loss Estimation in Accounting Data

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Loss of revenue due to erroneous invoicing is a serious problem for many companies in the repair and maintenance industry. Revenue loss can occur in many ways, for example by consistently charging the wrong hourly price for services. If a company is experiencing revenue loss, it is incredibly important to detect it, find where it is happening, and estimate the size of it in order to treat it. The goal of this work is to find statistical methods for detecting incorrectly charged services in a dataset of invoices, and estimate the loss of revenue in the same dataset. The dataset used comes from a real company experiencing revenue loss through incorrectly charged prices for services, and thus represents a real world instance of this problem. Multiple machine learning methods with different levels of supervision are tested for detecting anomalous invoice items and estimating revenue loss using raw invoice data. Neural network regression, and different decision tree regression methods, as well as an ensemble of these are tested and compared. The dataset has ground truth labels for each price, thus results are compared to real world targets. It is found that an ensemble using a weighted average of predictions from neural network regression and gradient boosted decision tree regression to predict the charged prices in an invoice dataset performs anomaly detection most reliably. On the top 1000 anomaly candidates, this method flags anomalies correctly 87% of the time, catching 45% of all anomalies. Moreover, in terms of estimating revenue loss, using a neural network to perform regression, a revenue loss error of just 13% is achieved. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)