Semantic segmentation of off-road scenery on embedded hardware using transfer learning

University essay from KTH/Mekatronik

Abstract: Real-time semantic scene understanding is a challenging computer vision task for autonomous vehicles. A limited amount of research has been done regarding forestry and off-road scene understanding, as the industry focuses on urban and on-road applications. Studies have shown that Deep Convolutional Neural Network architectures, using parameters trained on large datasets, can be re-trained and customized with smaller off-road datasets, using a method called transfer learning and yield state-of-the-art classification performance. This master’s thesis served as an extension of such existing off-road semantic segmentation studies. The thesis focused on detecting and visualizing the general trade-offs between classification performance, classification time, and the network’s number of available classes. The results showed that the classification performance declined for every class that got added to the network. Misclassification mainly occurred in the class boundary areas, which increased when more classes got added to the network. However, the number of classes did not affect the network’s classification time. Further, there was a nonlinear trade-off between classification time and classification performance. The classification performance improved with an increased number of network layers and a larger data type resolution. However, the layer depth increased the number of calculations and the larger data type resolution required a longer calculation time. The network’s classification performance increased by 0.5% when using a 16-bit data type resolution instead of an 8-bit resolution. But, its classification time considerably worsened as it segmented about 20 camera frames less per second with the larger data type. Also, tests showed that a 101-layered network slightly degraded in classification performance compared to a 50-layered network, which indicated the nonlinearity to the trade-off regarding classification time and classification performance. Moreover, the class constellations considerably impacted the network’s classification performance and continuity. It was essential that the class’s content and objects were visually similar and shared the same features. Mixing visually ambiguous objects into the same class could drop the inference performance by almost 30%. There are several directions for future work, including writing a new and customized source code for the ResNet50 network. A customized and pruned network could enhance both the application’s classification performance and classification speed. Further, procuring a task-specific forestry dataset and transferring weights pre-trained for autonomous navigation instead of generic object segmentation could lead to even better classification performance.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)