DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception

Wen, Youpeng; Zhu, Yi; Zhan, Zhihao; Ren, Pengzhen; Han, Jianhua; Xu, Hang; Zhao, Shen; Liang, Xiaodan

Youpeng Wen, Yi Zhu, Zhihao Zhan, Pengzhen Ren, Jianhua Han, Hang Xu, Shen Zhao, Xiaodan Liang; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 3320-3329

Abstract

Actionable part perception for robotic object manipulation needs to perceive parts over open-world object categories within 3D space which is challenging as the appearance of the same part on different objects varies greatly. It is frequently observed that despite the huge intra-class difference in appearance the parts share common interactive functions over different objects i.e. common affordance. According to this observation we propose DisCo a novel technique that Discovers Common affordance information from powerful large models for guiding the actionable part perception across open-world objects. Specifically we first use a large language model to identify the object names that each part potentially belongs to and a text-to-image generative model to generate image examples for the queried objects constructing image-text paired data that indicate visual and semantic information of common affordance. Then our model encodes the common affordance information by learning to pair the object-part images with their text descriptions. Subsequently the 2D-pixel features are distilled into 3D space thus the 3D point features are enriched with not only the semantic information of open-set objects but also the common affordance information which is highly generalizable. Finally a segmentation head and a pose regression network are developed to predict more accurate results of part segmentation and pose estimation improving the success rate of robotic object manipulation. Extensive experiments show that our method outperforms existing methods on the part instance and semantic segmentation by significant margins of 4.8% mAP 5.4% AP50 and 3.9% mIoU on the unseen object categories.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Wen_2025_WACV, author = {Wen, Youpeng and Zhu, Yi and Zhan, Zhihao and Ren, Pengzhen and Han, Jianhua and Xu, Hang and Zhao, Shen and Liang, Xiaodan}, title = {DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {3320-3329} }