BaSSL: Boundary-aware Self-Supervised Learning for Video Scene Segmentation

Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 4027-4043

Abstract


Self-supervised learning has drawn attention through its effectiveness in learning in-domain representations with no ground-truth annotations; in particular, it is shown that properly designed pretext tasks bring significant performance gains for downstream tasks. Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a long video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks. In our framework, given a long video, we adopt a sliding window scheme; from a sequence of shots in each window, we discover a moment with a maximum semantic transition and leverage it as pseudo-boundary to facilitate the pre-training. Specifically, we introduce three novel boundary-aware pretext tasks: 1) Shot-Scene Matching (SSM), 2) Contextual Group Matching (CGM) and 3) Pseudo-boundary Prediction (PP); SSM and CGM guide the model to maximize intra-scene similarity and inter-scene discrimination by capturing contextual relation between shots while PP encourages the model to identify transitional moments. We perform an extensive analysis to validate effectiveness of our method and achieve the new state-of-the-art on the MovieNet-SSeg benchmark. The code is available at https://github.com/kakaobrain/bassl

Related Material


[pdf] [supp] [code]
[bibtex]
@InProceedings{Mun_2022_ACCV, author = {Mun, Jonghwan and Shin, Minchul and Han, Gunsoo and Lee, Sangho and Ha, Seongsu and Lee, Joonseok and Kim, Eun-Sol}, title = {BaSSL: Boundary-aware Self-Supervised Learning for Video Scene Segmentation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {4027-4043} }