A comparison between fully-supervised and self-supervised deep learning methods for tumour classification in digital pathology data

University essay from Luleå tekniska universitet/Institutionen för system- och rymdteknik

Abstract: Whole Slide Images (WSIs) are digital scans containing rich pathology information. There are many available WSI datasets that can be used for a wide range of purposes such as diagnostic tasks and analysis, but the availability of labeled WSI datasets is very limited since the annotation process is both very costly and time consuming. Self-supervised learning is a way of training neural networks to learn and predict the underlying structure of input data without any labels.  AstraZeneca have developed a self-supervised learning feature extractor, the Drug-development Image Model Embeddings (DIME) pipeline, that trains on unlabeled WSIs and produces numerical embedding representations of WSI. This thesis applies the DIME-embeddings to a binary tumour classification task on the annotated Camelyon16 dataset by using the DIME pipeline as a feature extractor and train a simple binary classifier on the embedding representations instead of the WSI patches. The results are then compared to previous fully-supervised learning approaches to see if the embedding features generated by the DIME pipeline are sufficiently predictive with simple classifiers for the downstream task of binary classification.  The DIME embeddings were trained using Logistic Regression, Multi-Layer Perceptron and Gradient Boosting and the best performing model, a Multi-Layer Perceptron neural network trained on the DIME embeddings produced with an inpainting algorithm achieved a patch-level classification accuracy of 97.3%. This is very competitive results to the fully-supervised algorithms trained on the Camelyon16 dataset, beating some of them, while having a 1.1% gap to the best performing fully-supervised model. In addition to this, the performance of the DIME embeddings on reduced training sets also shows that the features captured in the DIME embeddings are sufficient. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)