Uncertainty Quantification in Deep Learning for Breast Cancer Classification in Point-of-Care Ultrasound Imaging

University essay from Lunds universitet/Matematik LTH

Abstract: Breast cancer is the most common type of cancer worldwide with an estimate of 2.3 million new cases in 2020, and the number one cause of cancer-related deaths in women. While survival rates are high in many high-income countries, with a five year relative survival rate of 85% and more, the respective survival rates are poor in many middle- and low-income countries, with rates as low as 12% in Kyadondo, Uganda. This immense difference is largely due to the difference in availability of access to diagnostic tools and screenings, as well as the amount of diagnostic experts. One solution to bridge this gap and increase the survival rates in low-income countries could be to use point-of-care ultrasound (POCUS) imaging as a cheap and portable diagnostic tool, combined with a deep learning (DL) based algorithm for image classification. While it has previously been shown that this is possible and can produce good results, it is extremely important in a field like medical diagnostics to have a classifier that is also trustworthy, as wrong predictions can have severe consequences. This work therefore addresses the question of how to quantify uncertainties in a model's prediction and explores different methods from the field of uncertainty quantification (UQ) and out-of-distribution (OOD) detection, including Bayesian neural networks, deep ensembles and three different post-hoc methods. The results support the hypothesis that there is a correlation between uncertainty scores and the correctness of a prediction. The correlation was the strongest using an average ensemble with entropy-based total uncertainty. The results suggest that a suitable threshold should be set so that the predictions of the 20\% of test data with the highest uncertainties will be marked as not trustworthy. This improves the accuracy of the breast cancer classification (benign, malignant, normal) from previous 68.6% to 77.5%, binary accuracy (cancerous vs. non-cancerous) from 81.8% to 90.2%, and the AUC from 95.6% to 98.4%. Additionally, all methods were tested for the purpose of OOD detection using three different OOD data sets. The best results were achieved using the post-hoc OOD detection method energy score, performing well on all three data sets, followed by several types of ensembles. Overall, the results show that there is great potential in the different methods for the purpose of building a safer and more trustworthy classifier that can be applied in a real-world setting. Based on our findings, an average ensemble as the classification method with entropy-based total uncertainty is the most promising choice, followed by the energy score method. Further evaluation with more data and comparison to additional UQ methods is needed to confirm the optimal method.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)