Self-Supervised Learning for Tabular Data: Analysing VIME and introducing Mix Encoder

University essay from Lunds universitet/Fysiska institutionen

Abstract: We introduce Mix Encoder, a novel self-supervised learning framework for deep tabular data models based on Mixup [1]. Mix Encoder uses linear interpolations of samples with associated pretext tasks to form useful pre-trained representations. We further analyze the viability of tabular self-supervised learning by introducing VIME [2], an established representation learning framework for tabular data structures, to scarce healthcare datasets. We demonstrate that Mix Encoder outperforms VIME and a normal MLP in classifying breast cancer tabular data as well as show that both self-supervised learning frameworks can grant deep tabular models increased performance. Finally, we demonstrate that the combination of both representations, VIME and Mix, can yield even higher performance on certain datasets, such as early classification of diabetes.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)