Free-Grained Hierarchical Visual Recognition

Seulki Park, Zilin Wang, Stella X. Yu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 32767-32776

Abstract


Hierarchical image recognition aims to predict labels across a semantic taxonomy, typically assuming fine-grained annotations for every image. However, real-world supervision may appear at any level: a distant bird may only be labeled as "Bird", while a clear image allows "Bald eagle". To reflect this reality, we introduce free-grained hierarchical recognition, where training labels can appear at any level of a taxonomy, requiring consistent predictions under partial and mixed supervision. We construct benchmark datasets with varying label granularity and show that existing hierarchical methods degrade significantly in this setting. To address this, we propose simple yet effective approaches that leverage 1) semantic guidance from vision-language models and 2) visual structure through semi-supervised learning. Finally, we study free-grained inference, where the model adaptively selects prediction depth, enabling reliable coarse predictions when fine-grained ones are uncertain. Together, our task, datasets, and methods provide a practical step toward hierarchical recognition in real-world scenarios.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Park_2026_CVPR, author = {Park, Seulki and Wang, Zilin and Yu, Stella X.}, title = {Free-Grained Hierarchical Visual Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {32767-32776} }