YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset

Donglai Wei, Siddhant Kharbanda, Sarthak Arora, Roshan Roy, Nishant Jain, Akash Palrecha, Tanav Shah, Shray Mathur, Ritik Mathur, Abhijay Kemkar, Anirudh Chakravarthy, Zudi Lin, Won-Dong Jang, Yansong Tang, Song Bai, James Tompkin, Philip H.S. Torr, Hanspeter Pfister; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 21044-21053

Abstract


Many video understanding tasks require analyzing multi-shot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos. To address this challenge, we collected a new dataset---YouMVOS---of 200 popular YouTube videos spanning ten genres, where each video is on average five minutes long and with 75 shots. We selected recurring actors and annotated 431K segmentation masks at a frame rate of six, exceeding previous datasets in average video duration, object variation, and narrative structure complexity. We incorporated good practices of model architecture design, memory management, and multi-shot tracking into an existing video segmentation method to build competitive baseline methods. Through error analysis, we found that these baselines still fail to cope with cross-shot appearance variation on our YouMVOS dataset. Thus, our dataset poses new challenges in multi-shot segmentation towards better video analysis. Data, code, and pre-trained models are available at https://donglaiw.github.io/proj/youMVOS

Related Material


[pdf]
[bibtex]
@InProceedings{Wei_2022_CVPR, author = {Wei, Donglai and Kharbanda, Siddhant and Arora, Sarthak and Roy, Roshan and Jain, Nishant and Palrecha, Akash and Shah, Tanav and Mathur, Shray and Mathur, Ritik and Kemkar, Abhijay and Chakravarthy, Anirudh and Lin, Zudi and Jang, Won-Dong and Tang, Yansong and Bai, Song and Tompkin, James and Torr, Philip H.S. and Pfister, Hanspeter}, title = {YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {21044-21053} }