Sachin Shah1
Anustup Choudhury2
Guan-Ming Su2
Jaclyn Pytlarz2
Christopher A. Metzler1
Trisha Mittal2
1University of Maryland, College Park2Dolby Laboratories
WACV 2026
Abstract
We introduce Gaussian representations for videos
(GaRV), a novel video encoding and decoding scheme
based upon 3D Gaussians. Unlike traditional
representations, which encode videos as sequences of
frames, or neural representations, which encode
videos within the weights of a neural network, we
encode videos as a collection of 3D Gaussians within
a space-time volume. The key advantage of our
approach is that it enables efficient and flexible
rasterization-based video decoding. With a slight
drop in overall compression rate, GaRV offers an
8-50x improvement in decoding time and
2.5-15x reduction in GPU memory compared with
neural counterparts. Existing Gaussian video
techniques require 2-30x more disk space,
while also using more GPU resources than GaRV.
Moreover, GaRV offers unique flexibility in how and
when pixels are decoded: One can non-sequentially
decode frames/regions without penalty and can
selectively decode regions at high-resolution to
enable low-cost foveated video decoding.
Video Decoding
GaRV can decode frames at more than 500 FPS.
Beauty
Bosphorus
HoneyBee
Jockey
ReadySteadyGo
ShakeNDry
YachtRide
At higher bitrates GaRV's quality can improve while still preserving decode efficiency.
Bear
Breakdance Flare
Camel
Elephant
Kite Surf
Train
Spatial Resolution Control
GaRV can easily control where to spend the bitrate by combining
Gaussians from two different encodings.