Generation of Synthetic White Blood Cell Images Using Denoising Diffusion

University essay from Lunds universitet/Matematik LTH

Abstract: CellaVision’s digital hematology systems are designed to analyze blood and pre-classify different types of blood cells. Some abnormal white blood cells are rare, which can cause imbalanced datasets. This can lead to a decrease in pre- classification performance and a need to carry out more time-consuming data gathering. The aim of this thesis is to investigate the possibility of using deep learning to generate synthetic images of white blood cells with abnormalities, in order to augment the training dataset of the pre-classifier. Denoising diffusion is a new cutting edge method to generate synthetic data and has been shown to be able to generate state-of-the-art images. A diffusion model works by adding noise to training images and learning to remove the noise. The diffusion model of this thesis was created by first training a base model on im- ages with and without abnormalities and then fine-tuning it for three different types of abnormalities: hypersegmentation, Dohle bodies and hypergranulation. A Generative Adversarial Network (GAN) was trained and its performance was compared to the performance of the diffusion model. To evaluate the generated images, the performance of a classifier trained on a dataset augmented by generated images was compared to a classifier trained only on real cell images. It is uncertain whether adding generated images to the training dataset resulted in an improved classifier performance. For two of the abnormalities, an increase in accuracy was seen for the abnormal class but in the other cases there was a decrease in accuracy. Moreover, a medical expert and an experienced CellaVision employee were both given a set of 100 cell images, whereof 50 were synthetic. They were then asked to assess which cell images were synthetic. The medical expert was able to classify 96% of the real images as real, but only 32% of the synthetic images were correctly classified. In turn, the experienced CellaVision employee was able to correctly classify 44% of the real images and 24% of the synthetic.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)