Vote-in-Context: VLMs as Explainable Zero-Shot Rank Fusers

Mohamed Eltahir, Ali Habibullah, Lama Ayash, Tanveer Hussain, Naeemullah Khan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026, pp. 6496-6505

Abstract


In retrieval domain, fusing candidates from heterogeneous retrievers (R), especially for multi-modal data like videos is challenging. Typical training-free fusion methods lack content-awareness, relying either on rank or score signals. We introduce Vote-in-Context (ViC) , a generalized, training-free framework that re-thinks list-wise reranking and fusion as a zero-shot reasoning task for a Vision-Language Model (VLM). ViC serializes content evidence and retriever metadata into the VLM's prompt, allowing it to adaptively weigh retriever consensus against visual-linguistic content. This generalized framework naturally operates as content-aware rank fuser (R>1) and single-list reranker (R=1). We demonstrate ViC's potentials in video retrieval, where we serialize video contents into the VLM via our efficient S-Grid representation. Across video retrieval benchmarks, ViC sets new zero-shot SOTA, outperforming strong fusion baselines (R>1) and boosting individual retrievers (R=1), demonstrating its effectiveness in handling complex visual and temporal signals alongside text. ViC achieves massive gains of up to +40 Recall@1 over SOTA on all benchmarks, proving it as a simple, reproducible, and highly effective recipe for turning modern VLMs into powerful zero-shot rerankers and fusers capable of yielding grounded natural-language rationales for their top-K decisions. Codes will be made public.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Eltahir_2026_CVPR, author = {Eltahir, Mohamed and Habibullah, Ali and Ayash, Lama and Hussain, Tanveer and Khan, Naeemullah}, title = {Vote-in-Context: VLMs as Explainable Zero-Shot Rank Fusers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings}, month = {June}, year = {2026}, pages = {6496-6505} }