Moment Detection in Long Tutorial Videos

Croitoru, Ioana; Bogolin, Simion-Vlad; Albanie, Samuel; Liu, Yang; Wang, Zhaowen; Yoon, Seunghyun; Dernoncourt, Franck; Jin, Hailin; Bui, Trung

Ioana Croitoru, Simion-Vlad Bogolin, Samuel Albanie, Yang Liu, Zhaowen Wang, Seunghyun Yoon, Franck Dernoncourt, Hailin Jin, Trung Bui; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 2594-2604

Abstract

Tutorial videos play an increasingly important role in professional development and self-directed education. For users to realise the full benefits of this medium, tutorial videos must be efficiently searchable. In this work, we focus on the task of moment detection, in which the goal is to localise the temporal window where a given event occurs within a given tutorial video. Prior work on moment detection has focused primarily on short videos (typically on videos shorter than three minutes). However, many tutorial videos are substantially longer (stretching to hours in duration), presenting significant challenges for existing moment detection approaches. To study this problem, we propose the first dataset of untrimmed, long-form tutorial videos for the task of Moment Detection called the Behance Moment Detection (BMD) dataset. BMD videos have an average duration of over one hour and are characterised by slowly evolving visual content and wide-ranging dialogue. To meet the unique challenges of this dataset, we propose a new framework, LongMoment-DETR, and demonstrate that it outperforms strong baselines. Additionally, we introduce a variation of the dataset that contains YouTube Chapter annotations and show that the features obtained by our framework can be successfully used to boost the performance on the task of chapter detection. Code and data can be found at https://github.com/ioanacroi/longmoment-detr.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Croitoru_2023_ICCV, author = {Croitoru, Ioana and Bogolin, Simion-Vlad and Albanie, Samuel and Liu, Yang and Wang, Zhaowen and Yoon, Seunghyun and Dernoncourt, Franck and Jin, Hailin and Bui, Trung}, title = {Moment Detection in Long Tutorial Videos}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {2594-2604} }