Quantification of DNA Nanoballs Using Image Processing Techniques

University essay from Uppsala universitet/Avdelningen Vi3

Abstract: In gene editing, it is important to identify the number of edited and unedited nucleic acids in the development of new therapies and drugs. Countagen is developing a technology for accelerating genomic research and their product is called GeneAbacus. The product is a consumable reagent kit for the quantification of nucleic acids, which can be used by CRISPR gene editing researchers. The DNA which is analyzed with the reagent kit is first extracted in an assay and then targeted with tailored padlock probes. The target region is amplified via RCA and the products collapse into a fluorescent DNA nanoball, which can be analyzed with a fluorescence microscope. Each fluorescent dot in the microscope corresponds to a single recognition event, making the quantification of the edited and unedited nucleic acids possible.  The purpose of this project was to count the number of DNA nanoballs in images from a fluorescence microscope with a focus on deep learning. To do this, the images were first preprocessed to enhance the image quality and then cropped into small patches, before the patches were manually annotated on image-level. The mean value from three annotators was used as the label and the labelled images were used to train a ResNet by using a regression- based approach. PyTorch and the API Fastai were used for training and the applied method was transfer learning. The network was trained in two stages: first, the newly added layers were trained for feature extraction, and then the pre-trained base model was unfrozen and trained for fine-tuning. To find the position of the nanoballs in the images, Class Activation Maps (CAMs) and Gradient-weighted Class Activation Mapping (Grad-CAMs) were created, and the local maxima were calculated to produce statistics.  The best-performing model was a ResNet34 trained with batch size 32 and the loss function Huber loss. The model inference showed that the deep learning model counted the nanoballs in the same interval as the observers in 40 of 50 test images. The created CAMs and Grad-CAMs had too low resolution to find the coordinates of the detected nanoballs.  During this project, the nanoballs were only counted in small patches, but the goal was to find nanoballs in a large image. This project has been limited by time and unfortunately, the step where the number of nanoballs in the different patches were to be summed was not performed. However, the study showed that it is possible to implement and train a deep learning model to count nanoballs in small patches. It also showed that the activation maps had too low resolution to be able to find the positions of the nanoballs by looking for local maxima. The results showed that the number of patches used as training samples did not greatly impact the model’s performance when comparing 300 patches and 450 patches. Manual annotation of nanoballs was a difficult task since the nanoballs are moving when the images are taken, which results in unsharp nanoballs in some patches. Therefore, the manual annotation should probably be performed by experts to get the correct labels for the training. To improve the model and be able to find the positions of the nanoballs further investigation is needed. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)