FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing

Anant Khandelwal; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2516-2526

Abstract


Dynamic scene graph generation (SGG) from videos requires not only a comprehensive understanding of objects across scenes but also a method to capture the temporal motions and interactions with different objects. Moreover the long-tailed distribution of visual relationships is a crucial bottleneck for most dynamic SGG methods. This is because many of them focus on capturing spatio-temporal context using complex architectures leading to the generation of biased scene graphs. To address these challenges we propose FloCoDe: Flow-aware Temporal Consistency and Correlation Debiasing with uncertainty attenuation for unbiased dynamic scene graphs. FloCoDe employs feature warping using flow to detect temporally consistent objects across frames. To address the long-tail issue of visual relationships we propose correlation debiasing and a label correlation-based loss to learn unbiased relation representations for long-tailed classes. Specifically we propose to incorporate label correlations using contrastive loss to capture commonly co-occurring relations which aids in learning robust representations for long-tailed classes. Further we adopt the uncertainty attenuation-based classifier framework to handle noisy annotations in the SGG data. Extensive experimental evaluation shows a performance gain as high as 4.1% demonstrating the superiority of generating more unbiased scene graphs.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Khandelwal_2024_CVPR, author = {Khandelwal, Anant}, title = {FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {2516-2526} }