Towards Universal Soccer Video Understanding

Rao, Jiayuan; Wu, Haoning; Jiang, Hao; Zhang, Ya; Wang, Yanfeng; Xie, Weidi

Jiayuan Rao, Haoning Wu, Hao Jiang, Ya Zhang, Yanfeng Wang, Weidi Xie; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 8384-8394

Abstract

As a globally celebrated sport, soccer has attracted widespread interest from fans over the world. This paper aims to develop a comprehensive multi-modal framework for soccer video understanding.Specifically, we make the following contributions in this paper:(i) we introduce **SoccerReplay-1988**, the largest multi-modal soccer dataset to date, featuring videos and detailed annotations from 1,988 complete matches, with an automated annotation pipeline;(ii) we present the first visual-language foundation model in the soccer domain, **MatchVision**, which leverages spatiotemporal information across soccer videos and excels in various downstream tasks;(iii) we conduct extensive experiments and ablation studies on action classification, commentary generation, and multi-view foul recognition,and demonstrate state-of-the-art performance on all of them, substantially outperforming existing models, which has demonstrated the superiority of our proposed data and model. We believe that this work will offer a standard paradigm for sports understanding research. The code and model will be publicly available for reproduction.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Rao_2025_CVPR, author = {Rao, Jiayuan and Wu, Haoning and Jiang, Hao and Zhang, Ya and Wang, Yanfeng and Xie, Weidi}, title = {Towards Universal Soccer Video Understanding}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {8384-8394} }