It's Just Another Day: Unique Video Captioning by Discriminative Prompting

Toby Perrett, Tengda Han, Dima Damen, Andrew Zisserman; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 232-249

Abstract


Long videos contain many repeating actions, events and shots. These repetitions are frequently given identical captions, which makes it difficult to retrieve the exact desired clip using a text search. In this paper, we formulate the problem of unique captioning: Given multiple clips with the same caption, we generate a new caption for each clip that uniquely identifies it. We propose Captioning by Discriminative Prompting (CDP), which predicts a property that can separate identically captioned clips, and use it to generate unique captions. We introduce two benchmarks for unique captioning, based on egocentric footage and timeloop movies - where repeating actions are common. We demonstrate that captions generated by CDP improve text-to-video R@1 by 15% for egocentric videos and 10% in timeloop movies.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Perrett_2024_ACCV, author = {Perrett, Toby and Han, Tengda and Damen, Dima and Zisserman, Andrew}, title = {It's Just Another Day: Unique Video Captioning by Discriminative Prompting}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {232-249} }