- [pdf] [supp]
RV-GAN: Recurrent GAN for Unconditional Video Generation
Generative models aiming to generate content from noise have achieved high-fidelity synthesis for image data. However, obtaining comparable performance in the field of unconditional video generation still remains challenging. In this work, we propose a recurrent GAN architecture to model the high-dimensional video data distribution. Recurrent networks by design are able to generate complex, long sequences in an autoregressive fashion. However, the standard LSTM unit for videos (ConvLSTM) is not ideally suited for the task of unconditional video generation. Therefore, we propose a simple yet effective LSTM variant called as TransConv LSTM (TC-LSTM) by modulating the conventional ConvLSTM to have a transpose convolutional structure in input-to-state transitions. This enables the network to model both spatial and temporal relationships across layers simultaneously inside the TC-LSTM unit. TC-LSTM unit acts as a building block of our generator. Extensive quantitative and qualitative analysis shows that RV-GAN outperforms state-of-the-art methods by a significant margin on Moving MNIST, MUG, Weizmann and UCF101 datasets. Additionally, owing to the recurrent structure, our method is able to generate high-quality videos, up to 2 times longer (32 frames) than training videos at inference time. Further analysis confirms that the proposed architecture is generic and can be easily adapted to other tasks like class-conditional video synthesis and text-to-video synthesis.