CrossViT-ReID: Cross-Attention Vision Transformer for Occluded Cloth-Changing Person Re-Identification

Nguyen, Vuong D.; Mantini, Pranav; Shah, Shishir K.

Vuong D. Nguyen, Pranav Mantini, Shishir K. Shah; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 3982-3999

Abstract

Real-world Person Re-Identification (Re-ID) presents severe challenges like occlusions and clothing changes, making traditional Re-ID methods fail. Existing occluded Re-ID methods struggle with cloth-changing scenarios, while current cloth-changing Re-ID methods do not explicitly address occlusions. To this end, we propose CrossViT-ReID, the first framework for the challenging yet practical Occluded Cloth-Changing Person Re-ID task. We perform occlusion synthesis to expose the model to real-world occlusion variations, and capture cloth-invariant body shape modality from silhouettes. The key to success of CrossViT-ReID lies in our novel cross-modality collaborative training strategy which is capable of mining the complementary relationship between appearance and shape adaptively under occlusions, clothing changes, or bad lighting conditions. Specifically, we devise two identical ViT-based branches. One branch takes in holistic appearance and occluded shape, aiming to focus on appearance when shape is noisy. Meanwhile, occluded appearance and holistic shape are inputs to the other branch, aiming to attend to shape when appearance is partly unobservable. Cross attention fusion then makes the two branches exchange beneficial information and complement each other. After being trained, our framework is able to amplify the most informative cues when facing ambiguity caused by in-the-wild Re-ID challenges, thus significantly enhancing Re-ID accuracy. Extensive experiments demonstrate the superiority of CrossViT-ReID on both cloth-changing Re-ID and occluded Re-ID datasets.

Related Material

[pdf]

[bibtex]

@InProceedings{Nguyen_2024_ACCV, author = {Nguyen, Vuong D. and Mantini, Pranav and Shah, Shishir K.}, title = {CrossViT-ReID: Cross-Attention Vision Transformer for Occluded Cloth-Changing Person Re-Identification}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {3982-3999} }