Intermediate Connectors and Geometric Priors for Language-Guided Affordance Segmentation on Unseen Object Categories

Yicong Li, Yiyang Chen, Zhenyuan Ma, Junbin Xiao, Xiang Wang, Angela Yao; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 22836-22845

Abstract


Language-guided Affordance Segmentation (LASO) aims to identify actionable object regions based on text instructions. At the core of its practicality is learning generalizable affordance knowledge that captures functional regions across diverse objects. However, current LASO solutions struggle to extend learned affordances to object categories that are not encountered during training. Scrutinizing these designs, we identify limited generalizability on unseen categories, stemming from (1) underutilized generalizable patterns in the intermediate layers of both 3D and text backbones, which impedes the formation of robust affordance knowledge, and (2) the inability to handle substantial variability in affordance regions across object categories due to a lack of structural knowledge of the target region.Towards this, we introduce a GeneraLized frAmework on uNseen CategoriEs (GLANCE), incorporating two key components: a cross-modal connector that links intermediate stages of the text and 3D backbones to enrich pointwise embeddings with affordance concepts, and a VLM-guided query generator that provides affordance priors by extracting a few 3D key points based on the intra-view reliability and cross-view consistency of their multi-view segmentation masks. Extensive experiments on two benchmark datasets demonstrate that GLANCE outperforms state-of-the-art methods (SoTAs), with notable improvements in generalization to unseen categories. Our code is available at https://anonymous.4open.science/r/GLANCE.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Li_2025_ICCV, author = {Li, Yicong and Chen, Yiyang and Ma, Zhenyuan and Xiao, Junbin and Wang, Xiang and Yao, Angela}, title = {Intermediate Connectors and Geometric Priors for Language-Guided Affordance Segmentation on Unseen Object Categories}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {22836-22845} }