Drone-HAT: Hybrid Attention Transformer for Complex Action Recognition in Drone Surveillance Videos

Mustaqeem Khan, Jamil Ahmad, Abdulmotaleb El Saddik, Wail Gueaieb, Giulia De Masi, Fakhri Karray; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 4713-4722

Abstract


Ultra-high-resolution aerial videos are becoming increasingly popular for enhancing surveillance capabilities in sparsely populated areas. However analyzing human activities automatically such as "who is doing what?" in these videos is desirable to realize their surveillance potential. In contrast atomic visual action detection has successfully recognized such activities in movie data. However adapting it to ultra-high resolution aerial videos is challenging because the target persons appear relatively tiny from overhead views and are sparsely located. Additionally existing atomic visual action detection methods are based on single-label actions. However people can perform multiple actions simultaneously so a multi-label approach would be more appropriate. To address these problems we propose a multi-label action detection/recognition framework using a hybrid attention vision transformer (HAT) to recognize recurrent actions more efficiently. Additionally a multi-scale multi-granularity module inside the action recognition transformer block extracts relevant features without redundancy. Using the Okutama Dataset we demonstrated that our method performs better than existing state-of-the-art methodologies for interpreting aerial videos for human activity.

Related Material


[pdf]
[bibtex]
@InProceedings{Khan_2024_CVPR, author = {Khan, Mustaqeem and Ahmad, Jamil and El Saddik, Abdulmotaleb and Gueaieb, Wail and De Masi, Giulia and Karray, Fakhri}, title = {Drone-HAT: Hybrid Attention Transformer for Complex Action Recognition in Drone Surveillance Videos}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {4713-4722} }