-
[pdf]
[bibtex]@InProceedings{Armandi_2025_WACV, author = {Armandi, Vincenzo and Loretti, Andrea and Stacchio, Lorenzo and Cascarano, Pasquale and Marfia, Gustavo}, title = {Multi-Modal Large Language Model driven Augmented Reality Situated Visualization: the Case of Wine Recognition}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {1313-1322} }
Multi-Modal Large Language Model driven Augmented Reality Situated Visualization: the Case of Wine Recognition
Abstract
Situated Visualizations (SV) and reality-based information retrieval systems enhanced by Mixed Reality (MR) and Augmented Reality (AR) enable the overlay of digital information onto real-world objects providing context-aware content through computer vision. Despite their potential these systems face significant challenges in scalability and adaptability particularly for domains like wine recognition where diverse label designs frequent updates and limited historical databases complicate automated analysis. SOLLAMA (SOmmeLier LlAMA) is a novel wine recognition framework designed to address the scalability and adaptability challenges of augmented reality systems in recognizing diverse wine labels. Leveraging Multimodal Large Language Models (MLLMs) SOLLAMA integrates visual and textual cues for accurate label interpretation bypassing the need for extensive image datasets and traditional OCR methods. Built on the Augmented Wine Recognition (AWR) system it replaces the OCR module with LLAMA 3.2 for advanced text recognition and contextual understanding. Key benefits include scalability across diverse designs and simplified server-free deployment. Experimental validation on a dataset of wine labels from Italy's Emilia-Romagna region highlights the system's effectiveness demonstrating its potential to transform wine recognition in AR-based applications.
Related Material