DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Thanh-Dat Truong, Quoc-Huy Bui, Chi Nhan Duong, Han-Seok Seo, Son Lam Phung, Xin Li, Khoa Luu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 20030-20040

Abstract


Human action recognition has recently become one ofthe popular research topics in the computer vision community. Various 3D-CNN based methods have been presented to tackle both the spatial and temporal dimensions in thetask of video action recognition with competitive results.However, these methods have suffered some fundamentallimitations such as lack of robustness and generalization,e.g., how does the temporal ordering of video frames af-fect the recognition results? This work presents a novelend-to-end Transformer-based Directed Attention (Direc-Former) framework1for robust action recognition. The method takes a simple but novel perspective of Transformer-based approach to understand the right order of sequence actions. Therefore, the contributions of this work are three-fold. Firstly, we introduce the problem of ordered temporal learning issues to the action recognition problem. Secondly, a new Directed Attention mechanism is introduced to understand and provide attentions to human actions in the right order. Thirdly, we introduce the conditional dependency in action sequence modeling that includes orders and classes. The proposed approach consistently achieves the state-of-the-art (SOTA) results compared with the recent action recognition methods [4, 15, 62, 64], on three standard large-scale benchmarks, i.e. Jester, Kinetics-400 and Something-Something-V2.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Truong_2022_CVPR, author = {Truong, Thanh-Dat and Bui, Quoc-Huy and Duong, Chi Nhan and Seo, Han-Seok and Phung, Son Lam and Li, Xin and Luu, Khoa}, title = {DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {20030-20040} }