UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection

Hyolim Kang, Jinwoo Kim, Taehyun Kim, Seon Joo Kim; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 20073-20082

Abstract


Generic Event Boundary Detection (GEBD) is a newly suggested video understanding task that aims to find one level deeper semantic boundaries of events. Bridging the gap between natural human perception and video understanding, it has various potential applications, including interpretable and semantically valid video parsing. Still at an early development stage, existing GEBD solvers are simple extensions of relevant video understanding tasks, disregarding GEBD's distinctive characteristics. In this paper, we propose a novel framework for unsupervised/supervised GEBD, by using the Temporal Self-similarity Matrix (TSM) as the video representation. The new Recursive TSM Parsing (RTP) algorithm exploits local diagonal patterns in TSM to detect boundaries, and it is combined with the Boundary Contrastive (BoCo) loss to train our encoder to generate more informative TSMs. Our framework can be applied to both unsupervised and supervised settings, with both achieving state-of-the-art performance by a huge margin in GEBD benchmark. Especially, our unsupervised method outperforms previous state-of-the-art "supervised" model, implying its exceptional efficacy.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Kang_2022_CVPR, author = {Kang, Hyolim and Kim, Jinwoo and Kim, Taehyun and Kim, Seon Joo}, title = {UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {20073-20082} }