Video Interaction Recognition using an Attention Augmented Relational Network and Skeleton Data

Farzaneh Askari, Cyril Yared, Rohit Ramaprasad, Devin Garg, Anjun Hu, James J. Clark; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 3225-3234

Abstract


Recognizing interactions in multi-person videos known as Video Interaction Recognition (VIR) is crucial for understanding video content. Often the human skeleton pose (skeleton for short) is a popular feature for VIR as the main feature given its success for the task in hand. While many studies have made progress using complex architectures like Graph Neural Networks (GNN) and Transformers to capture interactions in videos studies such as [33] that apply simple easy to train and adaptive architectures such as Relation reasoning Network (RN) [37] yield competitive results. Inspired by this trend we propose the Attention Augmented Relational Network (AARN) a straightforward yet effective model that uses skeleton data to recognize interactions in videos. AARN outperforms other RN-based models and remains competitive against larger more intricate models. We evaluate our approach on a challenging real-world Hockey Penalty Dataset (HPD) where the videos depict complex interactions between players in a non-laboratory recording setup in addition to popular benchmark datasets demonstrating strong performance. Lastly we show the impact of skeleton quality on the classification accuracy and the struggle of off-the-shelf pose estimators to extract precise skeleton from the challenging HPD dataset.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Askari_2024_CVPR, author = {Askari, Farzaneh and Yared, Cyril and Ramaprasad, Rohit and Garg, Devin and Hu, Anjun and Clark, James J.}, title = {Video Interaction Recognition using an Attention Augmented Relational Network and Skeleton Data}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {3225-3234} }