Weakly Supervised Text-Based Person Re-Identification

Shizhen Zhao, Changxin Gao, Yuanjie Shao, Wei-Shi Zheng, Nong Sang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 11395-11404

Abstract


The conventional text-based person re-identification methods heavily rely on identity annotations. However, this labeling process is costly and time-consuming. In this paper, we consider a more practical setting called weakly supervised text-based person re-identification, where only the text-image pairs are available without the requirement of annotating identities during the training phase. To this end, we propose a Cross-Modal Mutual Training (CMMT) framework. Specifically, to alleviate the intra-class variations, a clustering method is utilized to generate pseudo labels for both visual and textual instances. To further refine the clustering results, CMMT provides a Mutual Pseudo Label Refinement module, which leverages the clustering results in one modality to refine that in the other modality constrained by the text-image pairwise relationship. Meanwhile, CMMT introduces a Text-IoU Guided Cross-Modal Projection Matching loss to resolve the cross-modal matching ambiguity problem. A Text-IoU Guided Hard Sample Mining method is also proposed for learning discriminative textual-visual joint embeddings. We conduct extensive experiments to demonstrate the effectiveness of the proposed CMMT, and the results show that CMMT performs favorably against existing text-based person re-identification methods.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhao_2021_ICCV, author = {Zhao, Shizhen and Gao, Changxin and Shao, Yuanjie and Zheng, Wei-Shi and Sang, Nong}, title = {Weakly Supervised Text-Based Person Re-Identification}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {11395-11404} }