An Encoder-Agnostic Weakly Supervised Method for Describing Textures

Shangbo Mao, Deepu Rajan; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 8101-8110

Abstract


Recent advances in Large Language Models (LLMs) have enabled the semantic description of textures in natural language aiming to capture them in richer detail. However most methods are confined to either depending on supervised training with pairs of images and manually annotated visual attributes that most texture datasets lack or using Vision-Language Models (VLMs) such as CLIP. In this paper we develop an encoder-agnostic Weakly supervised Texture Description Generator (WTDG) that employs a novel Scaled Ranked Kullback-Leibler divergence (SR-KL) loss between image and text modalities. Within the SR-KL loss formulation we leverage category information which is always available as ground-truths for all benchmark texture recognition datasets. We further extend our proposed WTDG to assist in texture recognition by using its generated texture descriptions. Thus we develop a multimodal framework called Tex^2 which is adept at simultaneous generation of texture description and recognition. Our approach exhibits promising performance in describing and recognizing textures on benchmark datasets.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Mao_2025_WACV, author = {Mao, Shangbo and Rajan, Deepu}, title = {An Encoder-Agnostic Weakly Supervised Method for Describing Textures}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {8101-8110} }