The V-SLAM Hurdler : A Faster V-SLAM System using Online Semantic Dynamic-and-Hardness-aware Approximation

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Visual Simultaneous Localization And Mapping (V-SLAM) and object detection algorithms are two critical prerequisites for modern XR applications. V-SLAM allows XR devices to geometrically map the environment and localize itself within the environment, simultaneously. Furthermore, object detectors based on Deep Neural Network (DNN) can be used to semantically understand what those features in the environment represent. However, both of these algorithms are computationally expensive, which makes it challenging for them to achieve good real-time performance on device. In this thesis, we first present TensoRT Quantized YOLOv4 (TRTQYOLOv4), a faster implementation of YOLOv4 architecture [1] using FP16 reduced precision and INT8 quantization powered by NVIDIA TensorRT [2] framework. Second, we propose the V-SLAM Hurdler: A Faster VSLAM System using Online Dynamic-and-Hardness-aware Approximation. The proposed system integrates the base RGB-D V-SLAM ORB-SLAM3 [3] with the INT8 TRTQ-YOLOv4 object detector, a novel Entropy-based Degreeof- Difficulty Estimator, an Online Hardness-aware Approximation Controller and a Dynamic Object Eraser, applying online dynamic-and-hardness aware approximation to the base V-SLAM system during runtime while increasing its robustness in dynamic scenes. We first evaluate the proposed object detector on public object detection dataset. The proposed FP16 precision TRTQ-YOLOv4 achieves 2×faster than the full-precision model without loss of accuracy, while the INT8 quantized TRTQ-YOLOv4 is almost 3×faster than the full-precision one with only 0.024 loss in mAP@50:5:95. Second, we evaluate our proposed V-SLAM system on public RGB-D SLAM dataset. In static scenes, the proposed system speeds up the base VSLAM system by +21.2% on average with only −0.7% loss of accuracy. In dynamic scenes, the proposed system not only accelerate the base system by +23.5% but also improves the accuracy by +89.3%, making it as robust as in the static scenes. Lastly, the comparison against the state-of-the-art SLAMs designed dynamic environments shows that our system outperforms most of the compared methods in highly dynamic scenes. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)