Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Du, Hang; Zhang, Sicheng; Xie, Binzhu; Nan, Guoshun; Zhang, Jiayang; Xu, Junrui; Liu, Hangyu; Leng, Sicong; Liu, Jiangming; Fan, Hehe; Huang, Dajiu; Feng, Jing; Chen, Linli; Zhang, Can; Li, Xuhuan; Zhang, Hao; Chen, Jianhang; Cui, Qimei; Tao, Xiaofeng

Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 18793-18803

Abstract

Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization our focus is on more practicality prompting us to raise the following crucial questions: "what anomaly occurred?" "why did it happen?" and "how severe is this abnormal event?". In pursuit of these answers we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA). Specifically each instance of the proposed benchmark involves three sets of human annotations to indicate the "what" "why" and "how" of an anomaly including 1) anomaly type start and end times and event descriptions 2) natural language explanations for the cause of an anomaly and 3) free text reflecting the effect of the abnormality. In addition we also introduce MMEval a novel evaluation metric designed to better align with human preferences for CUVA facilitating the measurement of existing LLMs in comprehending the underlying cause and corresponding effect of video anomalies. Finally we propose a novel prompt-based method that can serve as a baseline approach for the challenging CUVA. We conduct extensive experiments to show the superiority of our evaluation metric and the prompt-based approach.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Du_2024_CVPR, author = {Du, Hang and Zhang, Sicheng and Xie, Binzhu and Nan, Guoshun and Zhang, Jiayang and Xu, Junrui and Liu, Hangyu and Leng, Sicong and Liu, Jiangming and Fan, Hehe and Huang, Dajiu and Feng, Jing and Chen, Linli and Zhang, Can and Li, Xuhuan and Zhang, Hao and Chen, Jianhang and Cui, Qimei and Tao, Xiaofeng}, title = {Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {18793-18803} }