Incorporating Visual Grounding in GCN for Zero-Shot Learning of Human Object Interaction Actions

Chinmaya Devaraj, Cornelia Fermüller, Yiannis Aloimonos; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 5008-5017

Abstract


GCN-based zero-shot learning approaches commonly use fixed input graphs representing external knowledge that usually comes from language. However, such input graphs fail to incorporate the visual domain nuances. We introduce a method to ground the external knowledge graph visually. The method is demonstrated on a novel concept of grouping actions according to a shared notion and shown to be of superior performance in zero-shot action recognition on two challenging human manipulation action datasets, the EPIC Kitchens dataset, and the Charades dataset. We further show that visually grounding the knowledge graph enhances the performance of GCNs when an adversarial attack corrupts the input graph.

Related Material


[pdf]
[bibtex]
@InProceedings{Devaraj_2023_CVPR, author = {Devaraj, Chinmaya and Ferm\"uller, Cornelia and Aloimonos, Yiannis}, title = {Incorporating Visual Grounding in GCN for Zero-Shot Learning of Human Object Interaction Actions}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {5008-5017} }