Gaussian Representations for Video

Sachin Shah, Anustup Choudhury, Guan-Ming Su, Jaclyn Pytlarz, Christopher A. Metzler, Trisha Mittal; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026, pp. 827-837

Abstract


We introduce Gaussian representations for videos (GaRV), a novel video encoding and decoding scheme based upon 3D Gaussians. Unlike traditional representations, which encode videos as sequences of frames, or neural representations, which encode videos within the weights of a neural network, we encode videos as a collection of 3D Gaussians within a space-time volume. The key advantage of our approach is that it enables efficient and flexible rasterization-based video decoding. With a slight drop in overall compression rate, GaRV offers a 8-50ximprovement in decoding time and 2.5-15xreduction in GPU memory compared with neural counterparts. Existing Gaussian video techniques require 2-30xmore disk space, while also using more GPU resources than GaRV.Moreover, GaRV offers unique flexibility in how and when pixels are decoded: One can non-sequentially decode frames/regions without penalty and can selectively decode regions at high-resolution to enable low-cost foveated video decoding.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Shah_2026_WACV, author = {Shah, Sachin and Choudhury, Anustup and Su, Guan-Ming and Pytlarz, Jaclyn and Metzler, Christopher A. and Mittal, Trisha}, title = {Gaussian Representations for Video}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {March}, year = {2026}, pages = {827-837} }