-
[pdf]
[arXiv]
[bibtex]@InProceedings{Fayyazi_2026_CVPR, author = {Fayyazi, Arya and Akrami, Haleh}, title = {Proof-of-Perception: Certified Tool-Using Multimodal Reasoning with Compositional Conformal Guarantees}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {5144-5153} }
Proof-of-Perception: Certified Tool-Using Multimodal Reasoning with Compositional Conformal Guarantees
Abstract
We present Proof-of-Perception (PoP), a tool-using framework that casts multimodal reasoning as an executable graph with explicit reliability guarantees. Each perception or logic node outputs a conformal set \Gamma^ (t) _\delta(x), yielding calibrated, stepwise uncertainty; a lightweight controller uses these certificates to allocate compute under a budget--expanding with extra tool calls only when needed and stopping early otherwise. This grounds answers in verifiable evidence, reduces error compounding and hallucinations, and enables principled accuracy-compute trade-offs. Across document, chart, and multi-image QA benchmarks, PoP improves performance and reliability over strong chain-of-thought, ReAct-style, and program-of-thought baselines while using computation more efficiently. Code is available at \href https://github.com/AryaFayyazi/PoP https://github.com/AryaFayyazi/PoP .
Related Material

