Real-time Small Object Detection using Deep Neural Networks

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Malcolm Tivelius; [2021]

Keywords: ;

Abstract: Object detection is a research area within computer vision that consists of both localising and classifying objects in images. The applications of this kind of research in society are many, ranging from facial recognition to self driving cars. Some of these use cases requires the detection of objects in motion and are therefore considered to be in a separate category of object detection, commonly referred to as real time object detection. The goal of this thesis is to shed further light on the area of real time object detection by investigating the effectiveness of successful object detection techniques when applied to objects of smaller sizes. More specifically, the task of detecting small objects is described by the community as a difficult problem. This is also an area that has not been extensively researched before and the results could thus be used by the research community at large and/or for real life applications. This paper is a comparative study between the effectiveness of two different deep learning techniques within real time object detection, namely RetinaNet and YOLOv3. The objects used are small characters and digits that are engraved onto ball bearings. Ball bearings have been photographed while traveling on a production line, and a collection of such images are what constitutes the dataset used in this study. The goal is to classify as many characters and digits as possible on each bearing, with as low inference time as possible. The two deep learning models were implemented and then evaluated on their performance, measured in terms of precision and average inference time. The evaluation was performed on labeled bearings not previously seen by the two models. The results showthat RetinaNet vastly outperformsYOLOv3 when it comes to real-time object detection of small objects in terms of mAP@50. However, when it comes to average inference time YOLOv3 performed twice as fast as RetinaNet. In conclusion it can be noted that YOLOv3 struggles when it comes to smaller objects whereas RetinaNet excels in this area. It can also be concluded, from previous research, that an increase in mAP and average inference time is most likely limited by the hardware used during training. The verification of this could be a potential further investigation of this thesis 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)