Background-Aware Moment Detection for Video Moment Retrieval

Minjoon Jung, Youwon Jang, Seongho Choi, Joochan Kim, Jin-Hwa Kim, Byoung-Tak Zhang; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 8575-8585

Abstract


Video moment retrieval (VMR) identifies a specific moment in an untrimmed video for a given natural language query. This task is prone to suffer the weak alignment problem innate in video datasets. Due to the ambiguity a query does not fully cover the relevant details of the corresponding moment or the moment may contain misaligned and irrelevant frames potentially limiting further performance gains. To tackle this problem we propose a background-aware moment detection transformer (BM-DETR). Our model adopts a contrastive approach carefully utilizing the negative queries matched to other moments in the video. Specifically our model learns to predict the target moment from the joint probability of each frame given the positive query and the complement of negative queries. This leads to effective use of the surrounding background improving moment sensitivity and enhancing overall alignments in videos. Extensive experiments on four benchmarks demonstrate the effectiveness of our approach. Our code is available at https://github.com/minjoong507/BM-DETR.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Jung_2025_WACV, author = {Jung, Minjoon and Jang, Youwon and Choi, Seongho and Kim, Joochan and Kim, Jin-Hwa and Zhang, Byoung-Tak}, title = {Background-Aware Moment Detection for Video Moment Retrieval}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {8575-8585} }