OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

Guan Wang, Zhimin Li, Qingchao Chen, Yang Liu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 27938-27947

Abstract


Dynamic Scene Graph Generation (DSGG) focuses on identifying visual relationships within the spatial-temporal domain of videos. Conventional approaches often employ multi-stage pipelines which typically consist of object detection temporal association and multi-relation classification. However these methods exhibit inherent limitations due to the separation of multiple stages and independent optimization of these sub-problems may yield sub-optimal solutions. To remedy these limitations we propose a one-stage end-to-end framework termed OED which streamlines the DSGG pipeline. This framework reformulates the task as a set prediction problem and leverages pair-wise features to represent each subject-object pair within the scene graph. Moreover another challenge of DSGG is capturing temporal dependencies we introduce a Progressively Refined Module (PRM) for aggregating temporal context without the constraints of additional trackers or handcrafted trajectories enabling end-to-end optimization of the network. Extensive experiments conducted on the Action Genome benchmark demonstrate the effectiveness of our design. The code and models are available at https://github.com/guanw-pku/OED.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2024_CVPR, author = {Wang, Guan and Li, Zhimin and Chen, Qingchao and Liu, Yang}, title = {OED: Towards One-stage End-to-End Dynamic Scene Graph Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {27938-27947} }