Improved Conditional VRNNs for Video Prediction

Lluis Castrejon, Nicolas Ballas, Aaron Courville; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7608-7617


Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting. To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models. Our approach relies on a hierarchy of latent variables, which defines a family of flexible prior and posterior distributions in order to better model the probability of future sequences. We validate our proposal through a series of ablation experiments and compare our approach to current state-of-the-art latent variable models. Our method performs favorably under several metrics in three different datasets.

Related Material

[pdf] [supp]
author = {Castrejon, Lluis and Ballas, Nicolas and Courville, Aaron},
title = {Improved Conditional VRNNs for Video Prediction},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}