Gene Expression Guided Distance Metric Learning for Breast Cancer Whole Slide Image Analysis

University essay from Lunds universitet/Matematik LTH

Abstract: Female breast cancer is a complex and heterogeneous disease that accounts for most of the deaths caused by cancer in women worldwide. To stratify breast cancer patients into treatment groups is a challenging task, and in recent years, analysis of the genes active in the tumour has been used in the decision of cancer therapy. Although gene expression analysis is expensive and not available for most breast cancer patients, calling for a more cost-effective and reproducible alternative. In the following thesis, a gene expression guided embedding extractor network is trained that maps whole slide images of female breast cancer tumours into embeddings in a metric space in which relative distances should be similar to the distances in the corresponding gene expression data. In the thesis, the embedding extractor network is the convolutional-based neural network ResNet-50. The metrics studied for distance measurements were the L1-distance dL1, cosine distance dCL, L2-distance dL2 and an average L1-distance dMAD. In the thesis, each whole slide image consisted of smaller tiles. Examining the model’s performance basing the distance measurement on one or multiple tiles from each slide, it was seen that the best performing metric was dMAD with the multi-tile calculation. The final model gave a Pearson correlation coefficient between predicted- and ground truth distances of ρ = 0.631 on the test data. The statistical significance of the correlation between predicted- and ground truth distances was evaluated with a Mantel test, resulting in a p-value < 1e−15. The thesis suggests that an image-based approach could serve as a potential alternative to gene expression profiling, with the possibility of further research and evaluation.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)