Temporal Multimodal Memory Banks for Agentic Reasoning

Prasanth Yadla; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 6519-6526

Abstract


Current multimodal AI agents process inputs in isolation,lacking the ability to maintain and reason over episodic memories across extended time horizons. We introduce ChronoMem, a novel architecture that enables agents tostore, consolidate, and retrieve multimodal experiences for enhanced temporal reasoning. ChronoMem employs hierarchical memory encoding, cross-modal temporal indexing, and experience-guided retrieval to capture temporal dependencies across visual, audio, and textual modalities. Through comprehensive evaluation on three adapted public benchmark tasks spanning navigation (AVLMaps), human interaction (MTPChat), and scientific assistance (RVTALL), we demonstrate that ChronoMem achieves consistent but modest improvements over memory-augmented baselines: 3.3% average improvement in long-term task performance compared to extended context methods and 3.7% better memory retrieval accuracy compared to systems without persistent memory. Our work addresses fundamental limitations in current agentic systems and provides a principled approach for incorporating episodic memory into multimodal AI agents.

Related Material


[pdf]
[bibtex]
@InProceedings{Yadla_2025_ICCV, author = {Yadla, Prasanth}, title = {Temporal Multimodal Memory Banks for Agentic Reasoning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {6519-6526} }