Strategies to Leverage Foundational Model Knowledge in Object Affordance Grounding

Arushi Rai, Kyle Buettner, Adriana Kovashka; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 1714-1723


An important task for intelligent systems is affordance grounding where the goal is to locate regions on an object where an action can be performed. Past weakly supervised approaches learn from human-object interaction (HOI) by transferring grounding knowledge from exocentric to egocentric views of an object. The use of HOI priors is inherently noisy and thus provides a limited source of supervision. To address this challenge we identify that recent foundational models (i.e. VLMs and LLMs) can serve as auxiliary sources of knowledge for frameworks due to their vast world knowledge. In this work we propose strategies to extract and leverage foundational model knowledge related to attributes and object parts to enhance an HOI-based affordance grounding framework. In particular we propose to combine HOI and foundational model priors through (1) a spatial consistency loss and (2) heatmap aggregation. Our strategies result in mKLD and mNSS improvements and insights suggest future directions for improving affordance grounding capabilities.

Related Material

@InProceedings{Rai_2024_CVPR, author = {Rai, Arushi and Buettner, Kyle and Kovashka, Adriana}, title = {Strategies to Leverage Foundational Model Knowledge in Object Affordance Grounding}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {1714-1723} }