-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Jiang_2025_CVPR, author = {Jiang, Wei and Li, Junru and Zhang, Kai and Zhang, Li}, title = {ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {7331-7341} }
ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression
Abstract
In Learned Video Compression (LVC), improving inter prediction, such as enhancing temporal context mining and mitigating accumulated errors, is crucial for boosting rate-distortion performance. Existing LVCs mainly focus on mining the temporal movements while neglecting non-local correlations among frames. Additionally, current contextual video compression models use a single reference frame, which is insufficient for handling complex movements. To address these issues, we propose leveraging non-local correlations across multiple frames to enhance temporal priors, significantly boosting rate-distortion performance. To mitigate error accumulation, we introduce a partial cascaded fine-tuning strategy that supports fine-tuning on full-length sequences with constrained computational resources. This method reduces the train-test mismatch in sequence lengths and significantly decreases accumulated errors. Based on the proposed techniques, we present a video compression scheme ECVC. Experiments demonstrate that our ECVC achieves state-of-the-art performance, reducing 10.5% and 11.5% more bit-rates than previous SOTA method DCVC-FM over VTM-13.2 low delay B (LDB) under the intra period (IP) of 32 and -1, respectively. Our code is available at https://github.com/JiangWeibeta/ECVC.
Related Material