Seeing Both Sides: Towards Bidirectional Semantic Alignment for Open-Vocabulary Camouflaged Object Segmentation

Zhang, Guohui; Sun, Fuming; Zhao, Yu; Kong, Yuqiu; Sun, Jing; Wang, Fasheng

Guohui Zhang, Fuming Sun, Yu Zhao, Yuqiu Kong, Jing Sun, Fasheng Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 27655-27664

Abstract

Open-Vocabulary Camouflaged Object Segmentation (OVCOS) aims to segment camouflaged objects from unseen categories under textual guidance precisely. However, existing methods often employ a unidirectional interaction strategy, where textual prompts guide the matching of visual features. Such a design neglects the bidirectional interaction between visual and language modalities, making the model vulnerable to the semantic gap between image-level textual semantics and pixel-level segmentation cues, which in turn leads to severe semantic confusion in complex camouflaged scenarios. To address this challenge, we propose BaCLIP, a novel bidirectional semantic alignment framework for OVCOS. At its core lies the Mutual Refinement and Enhancement Module (MREM), which establishes bidirectional cross-attention between visual and textual features, enabling mutual semantic calibration to resolve ambiguity and strengthen cross-modal alignment. Moreover, we introduce an Adaptive Prompt that transforms refined textual embeddings into semantic-aware prompts for Segment Anything Model (SAM), enabling direct textual guidance and improving mask precision. Experimental results on the OVCamo benchmark demonstrate that BaCLIP consistently achieves state-of-the-art performance with a compact architecture, effectively mitigating semantic confusion and advancing the understanding of cross-modal camouflage perception. Our code is released at https://github.com/okmaybach/BaCLIP-CVPR2026.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Zhang_2026_CVPR, author = {Zhang, Guohui and Sun, Fuming and Zhao, Yu and Kong, Yuqiu and Sun, Jing and Wang, Fasheng}, title = {Seeing Both Sides: Towards Bidirectional Semantic Alignment for Open-Vocabulary Camouflaged Object Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {27655-27664} }