Novel Class Discovery for Ultra-Fine-Grained Visual Categorization

Yu Liu, Yaqi Cai, Qi Jia, Binglin Qiu, Weimin Wang, Nan Pu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 17679-17688


Ultra-fine-grained visual categorization (Ultra-FGVC) aims at distinguishing highly similar sub-categories within fine-grained objects such as different soybean cultivars. Compared to traditional fine-grained visual categorization Ultra-FGVC encounters more hurdles due to the small inter-class and large intra-class variation. Given these challenges relying on human annotation for Ultra-FGVC is impractical. To this end our work introduces a novel task termed Ultra-Fine-Grained Novel Class Discovery (UFG-NCD) which leverages partially annotated data to identify new categories of unlabeled images for Ultra-FGVC. To tackle this problem we devise a Region-Aligned Proxy Learning (RAPL) framework which comprises a Channel-wise Region Alignment (CRA) module and a Semi-Supervised Proxy Learning (SemiPL) strategy. The CRA module is designed to extract and utilize discriminative features from local regions facilitating knowledge transfer from labeled to unlabeled classes. Furthermore SemiPL strengthens representation learning and knowledge transfer with proxy-guided supervised learning and proxy-guided contrastive learning. Such techniques leverage class distribution information in the embedding space improving the mining of subtle differences between labeled and unlabeled ultra-fine-grained classes. Extensive experiments demonstrate that RAPL significantly outperforms baselines across various datasets indicating its effectiveness in handling the challenges of UFG-NCD. Code is available at

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Liu_2024_CVPR, author = {Liu, Yu and Cai, Yaqi and Jia, Qi and Qiu, Binglin and Wang, Weimin and Pu, Nan}, title = {Novel Class Discovery for Ultra-Fine-Grained Visual Categorization}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {17679-17688} }