Visual Tracking With Fully Convolutional Networks

Lijun Wang, Wanli Ouyang, Xiaogang Wang, Huchuan Lu; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 3119-3127


We propose a new approach for general object tracking with fully convolutional neural network. Instead of treating convolutional neural network (CNN) as a black-box feature extractor, we conduct in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet. The discoveries motivate the design of our tracking system. It is found that convolutional layers in different levels characterize the target from different perspectives. A top layer encodes more semantic features and serves as a category detector, while a lower layer carries more discriminative information and can better separate the target from distracters with similar appearance. Both layers are jointly used with a switch mechanism during tracking. It is also found that for a tracking target, only a subset of neurons are relevant. A feature map selection method is developed to remove noisy and irrelevant feature maps, which can reduce computation redundancy and improve tracking accuracy. Extensive evaluation on the widely used tracking benchmark shows that the proposed tacker outperforms the state-of-the-art significantly.

Related Material

author = {Wang, Lijun and Ouyang, Wanli and Wang, Xiaogang and Lu, Huchuan},
title = {Visual Tracking With Fully Convolutional Networks},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2015}