Deep Video Generation, Prediction and Completion of Human Action Sequences

Haoye Cai, Chunyan Bai, Yu-Wing Tai, Chi-Keung Tang; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 366-382


Current video generation/prediction/completion results are limited, due to the severe ill-posedness inherent in these three problems. In this paper, we focus on human action videos, and propose a general, two-stage deep framework to generate human action videos with no constraints or arbitrary number of constraints, which uniformly address the three problems: video generation given no input frames, video prediction given the first few frames, and video completion given the first and last frames. To solve video generation from scratch, we build a two-stage framework where we first train a deep generative model that generates human pose sequences from random noise, and then train a skeleton-to-image network to synthesize human action videos given the human pose sequences generated. To solve video prediction and completion, we exploit our trained model and conduct optimization over the latent space to generate videos that best suit the given input frame constraints. With our novel method, we sidestep the original ill-posed problems and produce for the first time high-quality video generation/prediction/completion results of much longer duration. We present quantitative and qualitative evaluations to show that our approach outperforms state-of-the-art methods in all three tasks.

Related Material

[pdf] [arXiv]
author = {Cai, Haoye and Bai, Chunyan and Tai, Yu-Wing and Tang, Chi-Keung},
title = {Deep Video Generation, Prediction and Completion of Human Action Sequences},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}