Navigating the Unseen: Zero-shot Scene Graph Generation via Capsule-Based Equivariant Features

Wenhuan Huang, Yi JI, Guiqian Zhu, Li Ying, Chunping Liu; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 29448-29457

Abstract


In scene graph generation (SGG), the accurate prediction of unseen triples is essential for its effectiveness in downstream vision-language tasks. We hypothesize that the predicates of unseen triples can be viewed as transformations of seen predicates in feature space, and the essence of the zero-shot task is to bridge the gap caused by this transformation. Traditional models, however, have difficulty addressing this challenge, which we attribute to their inability to model the predicates equivariant. To overcome this limitation, we introduce a novel framework based on capsule networks (CAPSGG). We propose a Three-Stream Pipeline that generates modality-specific representations for predicates, while building low-level predicate capsules of these modalities. Then these capsules are aggregated into high-level predicate capsules using a Routing Capsule Layer. In addition, we introduce GroupLoss to aggregate capsules with the same predicate label into groups. This replaces the global loss with the intra-group loss, effectively balancing the learning of predicate invariance and equivariant features, while mitigating the impact of the severe long-tail distribution of the predicate categories. Our extensive experiments demonstrate the notable superiority of our approach over state-of-the-art methods, with zero-shot indicators outperforming up to 132.26% on SGCls task than the T-CAR [21]. Our code will be available upon publication.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Huang_2025_CVPR, author = {Huang, Wenhuan and JI, Yi and Zhu, Guiqian and Ying, Li and Liu, Chunping}, title = {Navigating the Unseen: Zero-shot Scene Graph Generation via Capsule-Based Equivariant Features}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {29448-29457} }