Automatic vs. Manual Data Labeling : A System Dynamics Modeling Approach

University essay from KTH/Skolan för industriell teknik och management (ITM)

Abstract: Labeled data, which is a collection of data samples that have been tagged with one or more labels, play an important role many software organizations in today's market. It can help in solving automation problems, training and validating machine learning models, or analysing data. Many organizations therefore set up their own labeled data gathering system which supplies them with the data they require. Labeling data can either be done by humans or be done via some automated process. However, labeling datasets comes with costs to these organizations. This study will examine what this labeled data gathering system could look like and determine which components that play a crucial role when determining how costly an automatic approach is compared to a manual approach using the company Klarna's label acquisition system as a case study. Two models are presented where one describes a system that solely uses humans for data annotation, while the other model describes a system where labeling is done via an automatic process. These models are used to compare costs to an organization taking those approaches. Important findings include the identification of important components that affects which approach would be more economically efficient to an organization under certain circumstances. Some of these important components are the label decay rate, automatic and manual expected accuracy, and number of data points that require labeling.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)