Instance Segmentation on depth images using Swin Transformer for improved accuracy on indoor images

University essay from Linköpings universitet/Artificiell intelligens och integrerade datorsystem

Abstract: The Simultaneous Localisation And Mapping (SLAM) problem is an open fundamental problem in autonomous mobile robotics. One of the latest most researched techniques used to enhance the SLAM methods is instance segmentation. In this thesis, we implement an instance segmentation system using Swin Transformer combined with two of the state of the art methods of instance segmentation namely Cascade Mask RCNN and Mask RCNN. Instance segmentation is a technique that simultaneously solves the problem of object detection and semantic segmentation. We show that depth information enhances the average precision (AP) by approximately 7%. We also show that the Swin Transformer backbone model can work well with depth images. Our results also show that Cascade Mask RCNN outperforms Mask RCNN. However, the results are to be considered due to the small size of the NYU-depth v2 dataset. Most of the instance segmentation researches use the COCO dataset which has a hundred times more images than the NYU-depth v2 dataset but it does not have the depth information of the image.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)