Modeling the Visual Ambiguity of Human Sketches

Zhou, Yang; Ni, Ping; Wang, Jin; Jia, Senyun; Yan, Jingdan; Huang, Kaixiang; Lu, Guodong; Yang, Jingru; He, Shengfeng

Yang Zhou, Ping Ni, Jin Wang, Senyun Jia, Jingdan Yan, Kaixiang Huang, Guodong Lu, Jingru Yang, Shengfeng He; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 16876-16886

Abstract

Human sketches provide a compact and expressive form of visual communication, but their sparse structural cues, while capturing essential object structures, introduce ambiguity because a single sketch can correspond to multiple plausible images, making cross-domain alignment uncertain and unstable. Such ambiguity fundamentally limits sketch-based vision tasks that rely on precise sketch--image correspondence. To address this challenge, we introduce AmbiScore, a metric that quantifies the ambiguity of sketch-image pairs, and use Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) as a testbed to reveal how ambiguous supervision leads to performance collapse in existing methods. We further propose DisAmb (Disentangling Ambiguity), a framework that explicitly models and mitigates ambiguity through two components: (1) Elastic Matching, which adaptively adjusts supervision strength using AmbiScore, and (2) Purified Matching, which employs ambiguity-agnostic masks to disentangle structure and appearance via shape jigsaw and texture swapping. DisAmb establishes new benchmarks under high ambiguity and provides a robust, transferable supervisory signal for downstream sketch-guided tasks.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Zhou_2026_CVPR, author = {Zhou, Yang and Ni, Ping and Wang, Jin and Jia, Senyun and Yan, Jingdan and Huang, Kaixiang and Lu, Guodong and Yang, Jingru and He, Shengfeng}, title = {Modeling the Visual Ambiguity of Human Sketches}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {16876-16886} }