Binary Verification for Zero-Shot Vision

Rongbin Hu, Jeffrey Liu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026, pp. 11498-11506

Abstract


We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs. It comprises two steps: (i) quantization, which turns the open-ended query into a multiple-choice question (MCQ) with a small, explicit list of unambiguous candidates; and (i) binarization, which asks one True/False question per candidate and resolves deterministically: if exactly one is True, select it; otherwise, revert to an MCQ over the remaining plausible candidates. We evaluate the workflow on referring expression grounding (REC), spatial reasoning (Spatial-Map, Spatial-Grid, Spatial-Maze), and BLINK-Jigsaw. Relative to answering open-ended queries directly, quantization to MCQ yields large gains, and True/False binarization provides a consistent additional boost. Across all tasks, the same workflow produces significant improvements, indicating generality. We further integrate the proposed REC workflow into a real-world video processing and editing system, and present the system architecture and end-to-end pipeline in the paper. We formalize how open-ended vision queries can be quantized to MCQs and further binarized into True/False verifications, yielding a hardness ladder and a simple explanation for why Boolean resolution improves accuracy. Together, these components define a unified inference-time workflow that offers a practical, drop-in path to stronger zero-shot vision with today's VLMs.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Hu_2026_CVPR, author = {Hu, Rongbin and Liu, Jeffrey}, title = {Binary Verification for Zero-Shot Vision}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2026}, pages = {11498-11506} }