AATrackT: A deep learning network using attentions for tracking fast-moving and tiny objects : (A)ttention (A)ugmented - (Track)ing on (T)iny objects
Abstract: Recent advances in deep learning have made it possible to visually track objects from a video sequence. Moreover, as transformers got introduced in computer vision, new state-of-the-art performances were achieved in visual tracking. However, most of these studies have used attentions to correlate the distinguishing factors between target-object and candidate-objects to localise the object throughout the video sequence. This approach is not adequate for tracking tiny objects. Also, conventional trackers in general are often not applicable to tracking extreme small objects, or objects that are moving fast. Therefore, the purpose of this study is to improve current methods to track tiny fast-moving objects, with the help of attentions. A deep neural network, named AATrackT, is built to address this gap by referring to it as a visual image segmentation problem. The proposed method is using data extracted from broadcasting videos of the sport Tennis. Moreover, to capture the global context of images, attention augmented convolutions are used as a substitute to the conventional convolution operation. Contrary to what the authors assumed, the experiment showed an indication that using attention augmented convolutions did not contribute to increasing the tracking performance. Our findings showed that the reason is mainly that the spatial resolution of the activation maps of 72x128 is too large for the attention weights to converge.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)