CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

Zhuoyan Luo, Yinghao Wu, Tianheng Cheng, Yong Liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 22685-22694

Abstract


The newly proposed Generalized Referring Expression Segmentation (GRES) amplifies the formulation of classic RES by involving complex multiple/non-target scenarios. Recent approaches address GRES by directly extending the well-adopted RES frameworks with object-existence identification. However, these approaches tend to encode multi-granularity object information into a single representation, which makes it difficult to precisely represent comprehensive objects of different granularity. Moreover, the simple binary object-existence identification across all referent scenarios fails to specify their inherent differences, incurring ambiguity in object understanding. To tackle the above issues, we propose a **Co**unting-Aware **H**ierarchical **D**ecoding framework (CoHD) for GRES. By decoupling the intricate referring semantics into different granularity with a visual-linguistic hierarchy, and dynamic aggregating it with intra- and inter-selection, CoHD boosts multi-granularity comprehension with the reciprocal benefit of the hierarchical nature. Furthermore, we incorporate the counting ability by embodying multiple/single/non-target scenarios into count- and category-level supervision, facilitating comprehensive object perception. Experimental results on gRefCOCO, Ref-ZOM, R-RefCOCO, and RefCOCO benchmarks demonstrate the effectiveness and rationality of CoHD which outperforms state-of-the-art GRES methods by a remarkable margin. Code is available.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Luo_2025_ICCV, author = {Luo, Zhuoyan and Wu, Yinghao and Cheng, Tianheng and Liu, Yong and Xiao, Yicheng and Wang, Hongfa and Zhang, Xiao-Ping and Yang, Yujiu}, title = {CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {22685-22694} }