-
[pdf]
[bibtex]@InProceedings{Shah_2025_CVPR, author = {Shah, Krish and Viswanath, Siddharth and Xi, Pengcheng and Wong, Alexander and Chen, Yuhao}, title = {FoodVideoQA: A Novel Baseline Framework for Dietary Monitoring}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {458-466} }
FoodVideoQA: A Novel Baseline Framework for Dietary Monitoring
Abstract
Food intake monitoring is a crucial area of research in food computing due to its complexity and significant potential for improving health outcomes. While traditional 2D image-based dietary assessments provide basic information, video offers a more detailed understanding of both the quantity of food consumed and the manner in which it is eaten. However, current video-based dietary analysis remains limited to coarse metrics, such as counting bites. In this paper, we introduce FoodVideoQA, a novel approach that leverages Vision-Language Models (VLMs) to analyze food intake videos comprehensively. We discuss the inherent limitations of a VLM-based approach to this problem, demonstrating the necessity for further novel approaches in this field. This work paves the way for future studies for more advanced multimodal food intake measurement and behavioral studies.
Related Material

