MIND-RAG: Multimodal Context-Aware and Intent-Aware Retrieval-Augmented Generation for Educational Publications

Yu, Jiayang; Xie, Yuxi; Zhang, Guixuan; Liu, Jie; Zeng, Zhi; Huang, Ying; Zhang, Shuwu

Jiayang Yu, Yuxi Xie, Guixuan Zhang, Jie Liu, Zhi Zeng, Ying Huang, Shuwu Zhang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 4216-4223

Abstract

Although multimodal Retrieval-Augmented Generation (RAG) systems have demonstrated wide applicability, they still suffer from limited image interpretability and limited retrieval performance when processing domain-specific documents. To address these challenges, we propose Multimodal INtent-Driven Retrieval-Augmented Generation (MIND-RAG), a novel framework tailored for educational scientific journals. MIND-RAG introduces two core innovations: (1) Context-aware image summarization, which extracts relevant textual context surrounding each image and uses it as a prompt to generate semantic summaries via a multimodal large model, enabling subsequent text-only retrieval; (2) Multimodal Intent-Aware Reranking, which jointly infers users' intent based on their latent needs for multimodal querying (e.g. picture, tabular, textual) and educational domain categories, and refines the ranking of retrieved results by aligning document thematic and modality-specific relevance with the query's inferred intent. Evaluated on the MEED-QA benchmark comprising educational journal entries spanning 10 years, MIND-RAG achieves 84.0% accuracy on complex Question Answering (QA) tasks and a 93.4% Mean Reciprocal Rank (MRR) for multimodal retrieval. These results demonstrate the effectiveness of MIND-RAG in real-world publication-based retrieval scenarios.

Related Material

[pdf]

[bibtex]

@InProceedings{Yu_2025_ICCV, author = {Yu, Jiayang and Xie, Yuxi and Zhang, Guixuan and Liu, Jie and Zeng, Zhi and Huang, Ying and Zhang, Shuwu}, title = {MIND-RAG: Multimodal Context-Aware and Intent-Aware Retrieval-Augmented Generation for Educational Publications}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {4216-4223} }