Composed Video Retrieval via Enriched Context and Discriminative Embeddings

Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 26896-26906

Abstract


Composed video retrieval (CoVR) is a challenging prob- lem in computer vision which has recently highlighted the in- tegration of modification text with visual queries for more so- phisticated video search in large databases. Existing works predominantly rely on visual queries combined with modi- fication text to distinguish relevant videos. However such a strategy struggles to fully preserve the rich query-specific context in retrieved target videos and only represents the target video using visual embedding. We introduce a novel CoVR framework that leverages detailed language descrip- tions to explicitly encode query-specific contextual informa- tion and learns discriminative embeddings of vision only text only and vision-text for better alignment to accurately retrieve matched target videos. Our proposed framework can be flexibly employed for both composed video (CoVR) and image (CoIR) retrieval tasks. Experiments on three datasets show that our approach obtains state-of-the-art per- formance for both CovR and zero-shot CoIR tasks achiev- ing gains as high as around 7% in terms of recall@K=1 score. Our code detailed language descriptions for WebViD- CoVR dataset are available at https://github.com/ OmkarThawakar/composed-video-retrieval.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Thawakar_2024_CVPR, author = {Thawakar, Omkar and Naseer, Muzammal and Anwer, Rao Muhammad and Khan, Salman and Felsberg, Michael and Shah, Mubarak and Khan, Fahad Shahbaz}, title = {Composed Video Retrieval via Enriched Context and Discriminative Embeddings}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {26896-26906} }