Improving embedded deep learning object detection by integrating infrared camera

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: George Punter; [2019]

Keywords: ;

Abstract: Deep learning is the current state-of-the-art for computer vision applications. FPGAs have a potential to fit this niche due to lower development costs and faster development cycles than ASICs, with a smaller size and power footprint than GPUs. Recent developments have allowed increasingly easier access to FPGA development with HLS and other frameworks which help the development of deep learning on FPGAs. However, neural networks deployed onto FPGAs suffer from reduced accuracy than their software counterparts. This thesis explores whether integrating an additional camera, namely longwave infrared, into an embedded computer vision system is a viable option to improve inference accuracy in critical vision tasks, and is split into threestages. First, we explore image registration methods between RGB and infrared images to find one suitable for embedded implementation, and conclude that for a static camera setup, manually assigning point matches to obtain a warping homography is the best route. Incrementally optimising this estimate or using phase congruency features combined with a feature matching algorithm are both promising avenues to pursue further. We implement this perspective warping function on an FPGA using the Vivado HLS workflow, concluding that whilst not without limitations – the development of computer vision functions in HLS is considerably faster than implementations in HDL. We note that the open-source PYNQ framework by Xilinxis convenient for edge data processing, allowing drop-in access to hardware accelerated functions from Python which opens up FPGA-accelerated data processing to less hardware-centric developers and data scientists. Finally, we analyse whether the additional IR data can improve the object detection accuracy of a pre-trained RGB network by calculating accuracy metrics for with and without image augmentation across a dataset of 7,777 annotated image pairs. We conclude that detection accuracy, especially for pedestrians and at night, can be significantly improved without requiring anynetwork retraining. We demonstrate that integrating an IR camera is a viable approach to improve the accuracy of deep learning vision systems in terms of implementation overhead. Future work should explore other methods of integrating the IR data, such as enhancing predictions by utilising hot-point information within bounding boxes, applying transfer learning principles with a dataset of augmented images, or improving the image registration and fusion stages.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)