Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection

Jash Dalvi, Ali Dabouei, Gunjan Dhanuka, Min Xu; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 5439-5448

Abstract


Video anomaly detection aims to develop automated models capable of identifying abnormal events in surveillance videos. The benchmark setup for this task is extremely challenging due to: i) the limited size of the training sets ii) weak supervision provided in terms of video-level labels and iii) intrinsic class imbalance induced by the scarcity of abnormal events. In this work we show that distilling knowledge from aggregated representations of multiple backbones into a single-backbone Student model achieves state-of-the-art performance. In particular we develop a bi-level distillation approach along with a novel disentangled cross-attention-based feature aggregation network. Our proposed approach DAKD (Distilling Aggregated Knowledge with Disentangled Attention) demonstrates superior performance compared to existing methods across multiple benchmark datasets. Notably we achieve significant improvements of 1.36% 0.78% and 7.02% on the UCF-Crime ShanghaiTech and XD-Violence datasets respectively.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Dalvi_2025_WACV, author = {Dalvi, Jash and Dabouei, Ali and Dhanuka, Gunjan and Xu, Min}, title = {Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {5439-5448} }