On the Evaluation of Multimodal Large Language Models for Agricultural Image Classification across Diverse Tasks

Anindya Bijoy Das, Shibbir Ahmed, Shahnewaz Karim Sakib; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2026, pp. 482-490

Abstract


Recent advancements in multimodal large language models (MLLMs) have enabled impressive capabilities in processing and understanding visual data. Originally built for general-purpose reasoning, these models are increasingly applied to visual tasks in domains like medical imaging and agriculture. In this study, we evaluate their effectiveness in classifying plant species from leaf images, detecting disease symptoms to support automated crop monitoring and early diagnosis, and performing taxonomy-free recognition from multiple plant parts such as fruits, flowers and trees. We evaluate state-of-the-art vision-language models on multiple datasets containing images of healthy and diseased leaves as well as broader crop imagery. Our experiments assess classification accuracy, disease detection performance, and model robustness across diverse plant morphologies. Our results show that few-shot prompting consistently improved classification accuracy compared to zero-shot, though open-source models lagged behind a proprietary black-box model that achieved higher overall accuracy. These findings highlight both the promise of multimodal models for agricultural image understanding and the need for domain adaptation.

Related Material


[pdf]
[bibtex]
@InProceedings{Das_2026_WACV, author = {Das, Anindya Bijoy and Ahmed, Shibbir and Sakib, Shahnewaz Karim}, title = {On the Evaluation of Multimodal Large Language Models for Agricultural Image Classification across Diverse Tasks}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {March}, year = {2026}, pages = {482-490} }