MCL for MLLMs: Benchmarking Forgetting in Task-Incremental Multimodal Learning

Zichao Li; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 2760-2766

Abstract


This paper introduces a novel framework for Multimodal Continual Learning (MCL) in Large Language Models (LLMs), addressing catastrophic forgetting across vision and language modalities. We propose: (1) a modality-aware memory replay mechanism with separate vision/text buffers and dynamic sample scoring, (2) a theoretical bound on forgetting that accounts for cross-modal interference, and (3) a unified evaluation protocol for task-incremental MCL. Experiments on ScienceQA and MMBench demonstrate our method reduces forgetting by 38% compared to baselines, with particular improvements in complex reasoning tasks (55.2% vs 49.8% accuracy). Theoretical and empirical analyses reveal our approach maintains 3x better modality balance (vision-text gap: 3.8% vs 12.8%) while scaling efficiently to long task sequences (BWT slope -0.16 vs -0.28).

Related Material


[pdf]
[bibtex]
@InProceedings{Li_2025_ICCV, author = {Li, Zichao}, title = {MCL for MLLMs: Benchmarking Forgetting in Task-Incremental Multimodal Learning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {2760-2766} }