Attribute-Driven Feature Disentangling and Temporal Aggregation for Video Person Re-Identification

Yiru Zhao, Xu Shen, Zhongming Jin, Hongtao Lu, Xian-sheng Hua; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4913-4922

Abstract


Video-based person re-identification plays an important role in surveillance video analysis, expanding image-based methods by learning features of multiple frames. Most existing methods fuse features by temporal average-pooling, without exploring the different frame weights caused by various viewpoints, poses, and occlusions. In this paper, we propose an attribute-driven method for feature disentangling and frame re-weighting. The features of single frames are disentangled into groups of sub-features, each corresponds to specific semantic attributes. The sub-features are re-weighted by the confidence of attribute recognition and then aggregated at the temporal dimension as the final representation. By means of this strategy, the most informative regions of each frame are enhanced and contributes to a more discriminative sequence representation. Extensive ablation studies demonstrate the effectiveness of feature disentangling as well as temporal re-weighting. The experimental results on the iLIDS-VID, PRID-2011 and MARS datasets demonstrate that our proposed method outperforms existing state-of-the-art approaches.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhao_2019_CVPR,
author = {Zhao, Yiru and Shen, Xu and Jin, Zhongming and Lu, Hongtao and Hua, Xian-sheng},
title = {Attribute-Driven Feature Disentangling and Temporal Aggregation for Video Person Re-Identification},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}