-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Lee_2025_WACV, author = {Lee, Hankyeol and Seo, Gawon and Choi, Wonseok and Jung, Geunyoung and Song, Kyungwoo and Jung, Jiyoung}, title = {Enhancing Visual Classification using Comparative Descriptors}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {5274-5283} }
Enhancing Visual Classification using Comparative Descriptors
Abstract
The performance of vision-language models (VLMs) such as CLIP in visual classification tasks has been enhanced by leveraging semantic knowledge from large language models (LLMs) including GPT. Recent studies have shown that in zero-shot classification tasks descriptors incorporating additional cues high-level concepts or even random characters often outperform those using only category names. In many classification tasks while the top-1 accuracy may be relatively low the top-5 accuracy is often significantly higher. This gap implies that most misclassifications occur among a few similar classes highlighting the model's difficulty in distinguishing between classes with subtle differences. To address this challenge we introduce a novel concept of comparative descriptors. These descriptors emphasize the unique features of a target class against its most similar classes enhancing differentiation. By generating and integrating these comparative descriptors into the classification framework we refine the semantic focus and improve classification accuracy. An additional filtering process ensures that these descriptors are closer to the image embeddings in the CLIP space further enhancing performance. Our approach demonstrates improved accuracy and robustness in visual classification tasks by addressing the specific challenge of subtle inter-class differences. Code is available at https://github.com/hk1ee/Comparative-CLIP
Related Material