Flexible Spatio-Temporal Networks for Video Prediction

Chaochao Lu, Michael Hirsch, Bernhard Scholkopf; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6523-6531

Abstract


We describe a modular framework for video frame prediction. We refer to it as a Flexible Spatio-Temporal Network (FSTN) as it allows the extrapolation of a video sequence as well as the estimation of synthetic frames lying in between observed frames and thus the generation of slow-motion videos. By devising a customized objective function comprising decoding, encoding, and adversarial losses, we are able to mitigate the common problem of blurry predictions, managing to retain high frequency information even for relatively distant future predictions. We propose and analyse different training strategies to optimize our model. Extensive experiments on several challenging public datasets demonstrate both the versatility and validity of our model.

Related Material


[pdf]
[bibtex]
@InProceedings{Lu_2017_CVPR,
author = {Lu, Chaochao and Hirsch, Michael and Scholkopf, Bernhard},
title = {Flexible Spatio-Temporal Networks for Video Prediction},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
year = {2017}
}