Dance With Self-Attention: A New Look of Conditional Random Fields on Anomaly Detection in Videos

Didik Purwanto, Yie-Tarng Chen, Wen-Hsien Fang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 173-183

Abstract


This paper proposes a novel weakly supervised approach for anomaly detection, which begins with a relation-aware feature extractor to capture the multi-scale convolutional neural network (CNN) features from a video. Afterwards, self-attention is integrated with conditional random fields (CRFs), the core of the network, to make use of the ability of self-attention in capturing the short-range correlations of the features and the ability of CRFs in learning the inter-dependencies of these features. Such a framework can learn not only the spatio-temporal interactions among the actors which are important for detecting complex movements, but also their short- and long-term dependencies across frames. Also, to deal with both local and non-local relationships of the features, a new variant of self-attention is developed by taking into consideration a set of cliques with different temporal localities. Moreover, a contrastive multi-instance learning scheme is considered to broaden the gap between the normal and abnormal instances, resulting in more accurate abnormal discrimination. Simulations reveal that the new method provides superior performance to the state-of-the-art works on the widespread UCF-Crime and ShanghaiTech datasets.

Related Material


[pdf]
[bibtex]
@InProceedings{Purwanto_2021_ICCV, author = {Purwanto, Didik and Chen, Yie-Tarng and Fang, Wen-Hsien}, title = {Dance With Self-Attention: A New Look of Conditional Random Fields on Anomaly Detection in Videos}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {173-183} }