Fine-Grained Motion Representation For Template-Free Visual Tracking

Kai Shuang, Yuheng Huang, Yue Sun, Zhun Cai, Hao Guo; The IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 671-680

Abstract


The object tracking task requires tracking the arbitrary target in consecutive video frames. Recently, several attempts have been made to develop the template-free models to attain generality. However, the current template-free paradigm only estimates the displacement to approximate the motion of the object. The displacement is insufficient to represent complex bounding box transformation, including scaling and rotation. We argue that the coarse-grained representation of object motion limits the performance of current template-free approaches. In this paper, we explore the finer-grained motion estimation to improve the accuracy of the template-free model. In respect of the image space, our method estimates the transformation for each pixel in the image. Concern on the motion representation, we represent the motion by the transformation parameterized by displacement, scaling, and rotation. By applying the differential vector operators on the optical flow, our approach estimates both displacement, scaling, and rotation for each pixel in a unified theory. To the best of our knowledge, we are the first work to model the displacement, scaling, and rotation in a unified theory with the optical flow. To further improve the localization accuracy, we develop the appearance branch to introduce the appearance information into our model. Furthermore, to suppress optical flow estimation failure samples during training, we propose a novel loss function Limited L1. The experiment shows our model FGTrack achieves state-of-the-art performance on both NFS and VOT2017 datasets.

Related Material


[pdf]
[bibtex]
@InProceedings{Shuang_2020_WACV,
author = {Shuang, Kai and Huang, Yuheng and Sun, Yue and Cai, Zhun and Guo, Hao},
title = {Fine-Grained Motion Representation For Template-Free Visual Tracking},
booktitle = {The IEEE Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}