Gaussian Representations for Video

Sachin Shah1    Anustup Choudhury2    Guan-Ming Su2
Jaclyn Pytlarz2    Christopher A. Metzler1    Trisha Mittal2
1University of Maryland, College Park   2Dolby Laboratories
WACV 2026

Abstract

We introduce Gaussian representations for videos (GaRV), a novel video encoding and decoding scheme based upon 3D Gaussians. Unlike traditional representations, which encode videos as sequences of frames, or neural representations, which encode videos within the weights of a neural network, we encode videos as a collection of 3D Gaussians within a space-time volume. The key advantage of our approach is that it enables efficient and flexible rasterization-based video decoding. With a slight drop in overall compression rate, GaRV offers an 8-50x improvement in decoding time and 2.5-15x reduction in GPU memory compared with neural counterparts. Existing Gaussian video techniques require 2-30x more disk space, while also using more GPU resources than GaRV. Moreover, GaRV offers unique flexibility in how and when pixels are decoded: One can non-sequentially decode frames/regions without penalty and can selectively decode regions at high-resolution to enable low-cost foveated video decoding.

Video Decoding

GaRV can decode frames at more than 500 FPS. At higher bitrates GaRV's quality can improve while still preserving decode efficiency.

Spatial Resolution Control

GaRV can easily control where to spend the bitrate by combining Gaussians from two different encodings.

Lowbit rate (outside)

Highbit rate (inside)

Combined


Template from Nerfies