-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Li_2025_ICCV, author = {Li, Lei-Lei and Fang, Jianwu and Xiao, Junbin and Pang, Shanmin and Yu, Hongkai and Lv, Chen and Xue, Jianru and Chua, Tat-Seng}, title = {Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {11208-11218} }
Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
Abstract
Egocentricly comprehending the causes and effects of car accidents is crucial for the safety of self-driving cars, and synthesizing causal-entity reflected accident videos can facilitate the capability test to respond to unaffordable accidents in reality. However, incorporating causal relations as seen in real-world videos into synthetic videos remains challenging. This work argues that precisely identifying the accident participants and capturing their related behaviors are of critical importance. In this regard, we propose a novel diffusion model Causal-VidSyn for synthesizing egocentric traffic accident videos. To enable causal entity grounding in video diffusion, Causal-VidSyn leverages the cause descriptions and driver fixations to identify the accident participants and behaviors, facilitated by accident reason answering and gaze-conditioned selection modules. To support Causal-VidSyn, we further construct Drive-Gaze, the largest driver gaze dataset (with 1.54M frames of fixations) in driving accident scenarios. Extensive experiments show that Causal-VidSyn surpasses state-of-the-art video diffusion models in terms of frame quality and causal sensitivity in various tasks, including accident video editing, normal-to-accident video diffusion, and text-to-video generation.
Related Material