A Systematic Study of Semi-Supervised Learning Based on Shapley Value Data Valuation

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Christie Courtnage; [2022]

Keywords: ;

Abstract: Semi-supervised learning algorithms seek to train prediction models on both labelled and unlabelled that outperform prediction models trained only on labelled data. Semi-supervised learning is often realised through the selection of unlabelled instances with predicted pseudo-labels. The standard approach in literature is to select pseudo-labelled instances based on the confidence values from the prediction models. An alternative, more direct approach that selects pseudo-labelled instances based on their contribution to the performance of a classifier is proposed in literature. The authors use Shapley value based data valuation to realise this. We identify that there exists two areas of possible variance: when labels are provided for unlabelled instances and in the calculation of the Shapley values. We propose five algorithms that employ cross-validation committee and bootstrapping strategies from ensemble learning to attempt to reduce these potential variances and provide a systematic study of semi-supervised learning using Shapley value based data valuation. It is experimentally shown that the proposed semi-supervised methods outperform methods trained only using labelled data.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)