ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning

Lee, Jongseo; Bae, Kyungho; Min, Kyle; Park, Gyeong-Moon; Choi, Jinwoo

Jongseo Lee, Kyungho Bae, Kyle Min, Gyeong-Moon Park, Jinwoo Choi; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 17546-17556

Abstract

In this work, we tackle the problem of video class-incremental learning (VCIL). Many existing VCIL methods mitigate catastrophic forgetting by rehearsal training with a few temporally dense samples stored in episodic memory, which is memory-inefficient. Alternatively, some methods store temporally sparse samples, sacrificing essential temporal information and thereby resulting in inferior performance. To address this trade-off between memory-efficiency and performance, we propose EpiSodic and SEmaNTIc memory integrAtion for video class-incremental Learning (ESSENTIAL). ESSENTIAL consists of episodic memory for storing temporally sparse features and semantic memory for storing general knowledge represented by learnable prompts. We introduce a novel memory retrieval (MR) module that integrates episodic memory and semantic prompts through cross-attention, enabling the retrieval of temporally dense features from temporally sparse features. We rigorously validate ESSENTIAL on diverse datasets: UCF-101, HMDB51, and Something-Something-V2 from the TCD benchmark and UCF-101, ActivityNet, and Kinetics-400 from the vCLIMB benchmark. Remarkably, with significantly reduced memory, ESSENTIAL achieves favorable performance on the benchmarks.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Lee_2025_ICCV, author = {Lee, Jongseo and Bae, Kyungho and Min, Kyle and Park, Gyeong-Moon and Choi, Jinwoo}, title = {ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {17546-17556} }