-
[pdf]
[bibtex]@InProceedings{Lin_2025_WACV, author = {Lin, Minghui and Wang, Shu and Wang, Xiang and Tang, Jianhua and Fu, Longbin and Zuo, Zhengrong and Sang, Nong}, title = {DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {2103-2112} }
DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification
Abstract
Current multi-modal object re-identification approaches based on large-scale pre-trained backbones (i.e. ViT) have displayed remarkable progress and achieved excellent performance. However these methods usually adopt the standard full fine-tuning paradigm which requires the optimization of considerable backbone parameters causing extensive computational and storage requirements. In this work we propose an efficient prompt-tuning framework tailored for multi-modal object re-identification dubbed DMPT which freezes the main backbone and only optimizes several newly added decoupled modality-aware parameters. Specifically we explicitly decouple the visual prompts into modality-specific prompts which leverage prior modality knowledge from a powerful text encoder and modality-independent semantic prompts which extract semantic information from multi-modal inputs such as visible near-infrared and thermal-infrared. Built upon the extracted features we further design a Prompt Inverse Bind (PromptIBind) strategy that employs bind prompts as a medium to connect the semantic prompt tokens of different modalities and facilitates the exchange of complementary multi-modal information boosting final re-identification results. Experimental results on multiple common benchmarks demonstrate that our DMPT can achieve competitive results to existing state-of-the-art methods while requiring only 6.5% fine-tuning of the backbone parameters.
Related Material