Transformer Based Object Detection and Semantic Segmentation for Autonomous Driving

University essay from Linköpings universitet/Datorseende

Abstract: The development of autonomous driving systems has been one of the most popular research areas in the 21st century. One key component of these kinds of systems is the ability to perceive and comprehend the physical world. Two techniques that address this are object detection and semantic segmentation. During the last decade, CNN based models have dominated these types of tasks. However, in 2021, transformer based networks were able to outperform the existing CNN approach, therefore, indicating a paradigm shift in the domain. This thesis aims to explore the use of a vision transformer, particularly a Swin Transformer, in an object detection and semantic segmentation framework, and compare it to a classical CNN on road scenes. In addition, since real-time execution is crucial for autonomous driving systems, the possibility of a parameter reduction of the transformer based network is investigated. The results appear to be advantageous for the Swin Transformer compared to the convolutional based network, considering both object detection and semantic segmentation. Furthermore, the analysis indicates that it is possible to reduce the computational complexity while retaining the performance.  

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)