Augmented Self-Mask Attention Transformer for Naturalistic Driving Action Recognition

Tiantian Zhang, Qingtian Wang, Xiaodong Dong, Wenqing Yu, Hao Sun, Xuyang Zhou, Aigong Zhen, Shun Cui, Dong Wu, Zhongjiang He; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7108-7114


Nowadays naturalistic driving action recognition and computer vision techniques provide crucial solutions to identify and eliminate distracting driving behavior. Existing methods often extract features through fixed-size sliding windows and predict an action's start and end time. However the information about a fixed-size window may be incomplete or redundant and the connections between different windows are insufficient. To alleviate this problem we propose a novel Augmented Self-Mask Attention (AMA) architecture that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order. We employ an ensemble technique and use a weighted boundaries fusion to combine and refine predictions with high confidence scores action boundaries. On the test dataset of AI City Challenge 2024 Track3 we achieved significant results compared with other teams the proposed model ranks first on the public leaderboard of the challenge. Codes are available at

Related Material

@InProceedings{Zhang_2024_CVPR, author = {Zhang, Tiantian and Wang, Qingtian and Dong, Xiaodong and Yu, Wenqing and Sun, Hao and Zhou, Xuyang and Zhen, Aigong and Cui, Shun and Wu, Dong and He, Zhongjiang}, title = {Augmented Self-Mask Attention Transformer for Naturalistic Driving Action Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7108-7114} }