Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 22030-22040

Abstract


We present MM-AU a novel dataset for Multi-Modal Accident video Understanding. MM-AU contains 11727 in-the-wild ego-view accident videos each with temporally aligned text descriptions. We annotate over 2.23 million object boxes and 58650 pairs of video-based accident reasons covering 58 accident categories. MM-AU supports various accident understanding tasks particularly multimodal video diffusion to understand accident cause-effect chains for safe driving. With MM-AU we present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD). AdVersa-SD performs video diffusion via an Object-Centric Video Diffusion (OAVD) method which is driven by an abductive CLIP model. This model involves a contrastive interaction loss to learn the pair co-occurrence of normal near-accident accident frames with the corresponding text descriptions such as accident reasons prevention advice and accident categories. OAVD enforces the object region learning while fixing the content of the original frame background in video generation to find the dominant objects for certain accidents. Extensive experiments verify the abductive ability of AdVersa-SD and the superiority of OAVD against the state-of-the-art diffusion models. Additionally we provide careful benchmark evaluations for object detection and accident reason answering since AdVersa-SD relies on precise object and accident reason information.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Fang_2024_CVPR, author = {Fang, Jianwu and Li, Lei-lei and Zhou, Junfei and Xiao, Junbin and Yu, Hongkai and Lv, Chen and Xue, Jianru and Chua, Tat-Seng}, title = {Abductive Ego-View Accident Video Understanding for Safe Driving Perception}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {22030-22040} }