LapsCore: Language-Guided Person Search via Color Reasoning

Yushuang Wu, Zizheng Yan, Xiaoguang Han, Guanbin Li, Changqing Zou, Shuguang Cui; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 1624-1633


The key point of language-guided person search is to construct the cross-modal association between visual and textual input. Existing methods focus on designing multimodal attention mechanisms and novel cross-modal loss functions to learn such association implicitly. We propose a representation learning method for language-guided person search based on color reasoning (LapsCore). It can explicitly build a fine-grained cross-modal association bidirectionally. Specifically, a pair of dual sub-tasks, image colorization and text completion, is designed. In the former task, rich text information is learned to colorize gray images, and the latter one requests the model to understand the image and complete color word vacancies in the captions. The two sub-tasks enable models to learn correct alignments between text phrases and image regions, so that rich multimodal representations can be learned. Extensive experiments on multiple datasets demonstrate the effectiveness and superiority of the proposed method.

Related Material

[pdf] [supp]
@InProceedings{Wu_2021_ICCV, author = {Wu, Yushuang and Yan, Zizheng and Han, Xiaoguang and Li, Guanbin and Zou, Changqing and Cui, Shuguang}, title = {LapsCore: Language-Guided Person Search via Color Reasoning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {1624-1633} }