CRAM: Large Scale Video Continual Learning with Bootstrapped Compression

Mall, Shivani; Henriques, Joao F.

Shivani Mall, Joao F. Henriques; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 15045-15055

Abstract

Continual learning (CL) promises to allow neural networks to learn from continuous streams of inputs, instead of IID (independent and identically distributed) sampling, which requires random access to a full dataset. This would allow for much smaller storage requirements and self-sufficiency of deployed systems that cope with natural distribution shifts, similarly to biological learning.We focus on video CL employing a rehearsal-based approach, which reinforces past samples from a memory buffer. We posit that part of the reason why practical video CL is challenging is the high memory requirements of video, further exacerbated by long-videos and continual streams, which are at odds with the common rehearsal-buffer size constraints. To address this, we propose to use compressed vision, i.e. store video codes (embeddings) instead of raw inputs, and train a video classifier by IID sampling from this rolling buffer. Training a video compressor online (so not depending on any pre-trained networks) means that it is also subject to catastrophic forgetting. We propose a scheme to deal with this forgetting by refreshing video codes, which requires careful decompression with a previous version of the network and recompression with a new one. We name our method Continually Refreshed Amodal Memory (CRAM). We expand current video CL benchmarks to large-scale settings, namely EpicKitchens-100 and Kinetics-700, with thousands of relatively long videos, and demonstrate empirically that our video CL method outperforms prior art with a significantly reduced memory footprint.

Related Material

[pdf] [arXiv]

[bibtex]

@InProceedings{Mall_2025_ICCV, author = {Mall, Shivani and Henriques, Joao F.}, title = {CRAM: Large Scale Video Continual Learning with Bootstrapped Compression}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {15045-15055} }