A Bidirectional ApproachApplied on Deeper and WiderSiamese Network

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Tobias Arnehall Johansson; [2023]

Keywords: ;

Abstract: Object tracking and object detection are two components within computer vision that have been widely improved during the last decade, in terms of precision and speed. This is mainly because deep learning has been incorporatedinto the algorithms, but also because new techniques and insights within the area are frequently released. Among the popular models for object tracking is the Siamese Region Proposal Network (RPN), which has the ability to track anysingle object in a video sequence. Over the years, several modified versions of the Siamese RPN have been developed, and for this thesis, one of these variations was chosen as the base model. To enhance the performance of the base model, a bidirectional extension was implemented. The objective behind the extension is to track the object in both forward and backward directions, while periodically updating the template frame, where the intuition was that this could make frames where the object is either occluded, blurred, or has changed its appearance, to have less impact on the tracking performance. Additionally, there was a desire from Tobii AB, the company involved in this thesis, to convert the basemodel from a single-object tracker to a multi-object tracker. The impact of incorporating the bidirectional extension was evaluated using datasets from the VOT-16 and VOT-17challenges. Although the original VOT metrics were not used, the results indicate a notable improvement in accuracy and robustness due to the bidirectional extension. However, the inference time was negatively affected as an additional model was required for it to function. Regarding the multi-object tracking conversion, the results demonstrated successful functionality, where the same tracking score was reached for each object as when trackinga single object. Similar to the bidirectional extension, the inclusion of each additional object required an extra model, leading to an overall increase in inference time. Therefore, in future research, it will be crucial to investigate how to enhance the model’s efficiency, to minimize the trade-off between precision and speed, particularly when several objects are tracked in the video sequence.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)