DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification

Lin, Minghui; Wang, Shu; Wang, Xiang; Tang, Jianhua; Fu, Longbin; Zuo, Zhengrong; Sang, Nong

Minghui Lin, Shu Wang, Xiang Wang, Jianhua Tang, Longbin Fu, Zhengrong Zuo, Nong Sang; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 2103-2112

Abstract

Current multi-modal object re-identification approaches based on large-scale pre-trained backbones (i.e. ViT) have displayed remarkable progress and achieved excellent performance. However these methods usually adopt the standard full fine-tuning paradigm which requires the optimization of considerable backbone parameters causing extensive computational and storage requirements. In this work we propose an efficient prompt-tuning framework tailored for multi-modal object re-identification dubbed DMPT which freezes the main backbone and only optimizes several newly added decoupled modality-aware parameters. Specifically we explicitly decouple the visual prompts into modality-specific prompts which leverage prior modality knowledge from a powerful text encoder and modality-independent semantic prompts which extract semantic information from multi-modal inputs such as visible near-infrared and thermal-infrared. Built upon the extracted features we further design a Prompt Inverse Bind (PromptIBind) strategy that employs bind prompts as a medium to connect the semantic prompt tokens of different modalities and facilitates the exchange of complementary multi-modal information boosting final re-identification results. Experimental results on multiple common benchmarks demonstrate that our DMPT can achieve competitive results to existing state-of-the-art methods while requiring only 6.5% fine-tuning of the backbone parameters.

Related Material

[pdf]

[bibtex]

@InProceedings{Lin_2025_WACV, author = {Lin, Minghui and Wang, Shu and Wang, Xiang and Tang, Jianhua and Fu, Longbin and Zuo, Zhengrong and Sang, Nong}, title = {DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {2103-2112} }