Detection of Humans in Video Streams Using Convolutional Neural Networks

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Abstract: This thesis is focused on human detection in video streams using Convolutional Neural Networks (CNNs). In recent years, CNNs have become common methods in various computer vision problems, and image detection is one popular application. The performance of CNNs on the detection problem has undergone a rapid increase in both accuracy and speed. In this thesis, we focus on a specific sub-domain of detection: human detection. Furthermore, it makes the problem more challenging as the data extracted from video streams captured by a head-mounted camera and therefore include difficult view points and strong motion blur. Considering both accuracy and speed, we choose two models with typical structures--You Only Look Once (YOLO) and Single Shot MultiBox Detector (SSD)--to experiment how robust the models perform on human domain with motion blur, and how the differences between the structures may influence the results. Several experiments are carried out in this thesis. With a better design of structure, SSD outperforms YOLO in various aspects. It is further proved as we fine-tuned YOLO and SSD300 on human data in Pascal VOC 2012 trainval dataset, showing the efficiency of SSD with fewer classes trained. As for motion blur problem, it is shown in the experiments that SSD300 has good ability to learn blurred patterns. The structure of SSD300 is further tested with regard to the design of default boxes and its performance on different scales and locations. The results show that the SSD model has a superior performance on online detection in video streams, but with a more customized structure it has potential to achieve even better results.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)