3D Shape Temporal Aggregation for Video-Based Clothing-Change Person Re-identification

Ke Han, Yan Huang, Shaogang Gong, Yan Huang, Liang Wang, Tieniu Tan; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 2371-2387

Abstract


3D shape of human body can be both discriminative and clothing-independent information in video-based clothing-change person re-identification (Re-ID). However, existing Re-ID methods usually generate 3D body shapes without considering identity modelling, which severely weakens the discriminability of 3D human shapes. In addition, different video frames provide highly similar 3D shapes, but existing methods cannot capture the differences among 3D shapes over time. They are thus insensitive to the unique and discriminative 3D shape information of each frame and ineffectively aggregate many redundant framewise shapes in a videowise representation for Re-ID. To address these problems, we propose a 3D Shape Temporal Aggregation (3STA) model for video-based clothing-change Re-ID. To generate the discriminative 3D shape for each frame, we first introduce an identity-aware 3D shape generation module. It embeds the identity information into the generation of 3D shapes by the joint learning of shape estimation and identity recognition. Second, a difference-aware shape aggregation module is designed to measure inter-frame 3D human shape differences and automatically select the unique 3D shape information of each frame. This helps minimise redundancy and maximise complementarity in temporal shape aggregation. We further construct a Video-based Clothing-Change Re-ID (VCCR) dataset to address the lack of publicly available datasets for video-based clothing-change Re-ID. Extensive experiments on the VCCR dataset demonstrate the effectiveness of the proposed 3STA model. The dataset is available at https://vhank.github.io/vccr.github.io.

Related Material


[pdf] [code]
[bibtex]
@InProceedings{Han_2022_ACCV, author = {Han, Ke and Huang, Yan and Gong, Shaogang and Huang, Yan and Wang, Liang and Tan, Tieniu}, title = {3D Shape Temporal Aggregation for Video-Based Clothing-Change Person Re-identification}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {2371-2387} }