Self-Supervised Video Interaction Classification Using Image Representation of Skeleton Data

Farzaneh Askari, Ruixi Jiang, Zhiwei Li, Jiatong Niu, Yuyan Shi, James J. Clark; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 5229-5238

Abstract


Recognizing interactions from sports games broadcast videos is an application of Interaction Recognition from Videos (IRV), that offers many challenges due to complex interactions that are often recorded from a suboptimal view point. Annotating large scale sports specific datasets are expensive and time-consuming. Therefore, in this study, we propose to demonstrate the effectiveness of applying Self-Supervised Learning (SSL) methods for building useful representations from human skeleton pose data (pose for short) without requiring costly annotations for a large scale dataset. Given the numerous well established image-based SSL methods, we demonstrate how to adapt them for sequences of pose through data transformation and a series of pose-based augmentations. We specifically adapt the Relational Reasoning SSL (Relational-SSL for short) [27] and achieve 68.18 +- 0% and 76.62 +- 2.7% in linear evaluation and finetuning protocols, respectively, for the downstream task of IRV from sports broadcast videos. Lastly, we run ablation studies on different components of the method, including the effect of using estimated pose (versus ground truth) on the performance of the downstream task.

Related Material


[pdf]
[bibtex]
@InProceedings{Askari_2023_CVPR, author = {Askari, Farzaneh and Jiang, Ruixi and Li, Zhiwei and Niu, Jiatong and Shi, Yuyan and Clark, James J.}, title = {Self-Supervised Video Interaction Classification Using Image Representation of Skeleton Data}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {5229-5238} }