Analyzing the performance of active learning strategies on machine learning problems

University essay from Uppsala universitet/Avdelningen för systemteknik

Abstract: Digitalisation within industries is rapidly advancing and data possibilities are growing daily. Machine learning models need a large amount of data that are well-annotated for good performance. To get well-annotated data, an expert is needed, which is expensive, and the annotation itself could be very time-consuming. The performance of machine learning models is dependent on the size of the data set since a large amount of annotation is required for a good performance. Active learning has emerged as a solution to increase the size of the data through selective annotation. Instead of labelling data points at random, active learning strategies can be used to select data points based on informativeness or uncertainty. The challenge lies in determining the most effective active learning strategy for a combination of machine learning model and problem type. Although active learning has been around for a while, benchmarking strategies have not widely been explored. The aim of the thesis was to benchmark different AL strategies and analyse their performance on underlying ML problems and ML methods/models. For this purpose, an experiment was constructed to, in an unbiased way, compare different machine learning models in combination with different active learning strategies within the areas of computer vision, drug discovery, and natural language processing. Nine different active learning strategies were analysed in the thesis, with a random strategy working as the baseline, tested on six different machine learning methods/models. The result of this thesis was that active learning had a positive effect within all problem areas and especially worked well for unbalanced data. The two main conclusions are that all active learning strategies work better for a smaller budget due to the importance of selecting informative data points and that prediction-based strategies are the most successful for all problem types.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)