Enhancing Object Detection in Infrared Videos through Temporal and Spatial Information

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Object detection is a prominent area of research within computer vision. While object detection based on infrared videos holds great practical significance, the majority of mainstream methods are primarily designed for visible datasets. This thesis investigates the enhancement of object detection accuracy on infrared datasets by leveraging temporal and spatial information. The Memory Enhanced Global-Local Aggregation (MEGA) framework is chosen as a baseline due to its capability to incorporate both forms of information. Based on the initial visualization result from the infrared dataset, CAMEL, the noisy characteristic of the infrared dataset is further explored. Through comprehensive experiments, the impact of temporal and spatial information is examined, revealing that spatial information holds a detrimental effect, while temporal information could be used to improve model performance. Moreover, an innovative Dual Frame Average Aggregation (DFAA) framework is introduced to address challenges related to object overlapping and appearance changes. This framework processes two global frames in parallel and in an organized manner, showing an improvement from the original configuration.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)