-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Gao_2025_ICCV, author = {Gao, Ge and Teng, Siyue and Peng, Tianhao and Zhang, Fan and Bull, David}, title = {GIViC: Generative Implicit Video Compression}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {17356-17367} }
GIViC: Generative Implicit Video Compression
Abstract
While video compression based on implicit neural representations (INRs) has recently demonstrated great potential, existing INR-based video codecs still cannot achieve state-of-the-art (SOTA) performance compared to their conventional or autoencoder-based counterparts given the same coding configuration. In this context, we propose a **G**enerative **I**mplicit **Vi**deo **C**ompression framework, **GIViC**, aiming at advancing the performance limits of this type of coding methods. GIViC is inspired by the characteristics that INRs share with large language and diffusion models in exploiting *long-term dependencies*. Through the newly designed *implicit diffusion* process, GIViC performs diffusive sampling across coarse-to-fine spatiotemporal decompositions, gradually progressing from coarser-grained full-sequence diffusion to finer-grained per-token diffusion. A novel **Hierarchical Gated Linear Attention-based transformer** (HGLA), is also integrated into the framework, which dual-factorizes global dependency modeling along scale and sequential axes. The proposed GIViC model has been benchmarked against SOTA conventional and neural codecs using a Random Access (RA) configuration (YUV 4:2:0, GOPSize=32), and yields BD-rate savings of 15.94%, 22.46% and 8.52% over VVC VTM, DCVC-FM and NVRC, respectively. As far as we are aware, GIViC is the **first INR-based video codec that outperforms VTM based on the RA coding configuration**. The source code will be made available.
Related Material