PhyLoNet: Physically-Constrained Long Term Video Prediction

Nir Ben Zikri, Andrei Sharf; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 877-893


Motions in videos are often governed by physical and biological laws such as gravity, collisions, flocking, etc. Accounting for such natural properties is an appealing way to improve realism in future frame video prediction. Nevertheless, the definition and computation of intricate physical and biological properties in motion videos are challenging. In this work, we introduce PhyLoNet, a PhyDNet extension that learns long-term future frame prediction and manipulation. Similar to PhyDNet, our network consists of a two-branch deep architecture that explicitly disentangles physical dynamics from complementary information. It uses a recurrent physical cell (PhyCell) for performing physicallyconstrained prediction in latent space. In contrast to PhyDNet, Phy- LoNet introduces a modified encoder-decoder architecture together with a novel relative flow loss. This enables a longer-term future frame prediction from a small input sequence with higher accuracy and quality. We have carried out extensive experiments, showing the ability of Phy- LoNet to outperform PhyDNet on various challenging natural motion datasets such as ball collisions, flocking, and pool games. Ablation studies highlight the importance of our new components. Finally, we show an application of PhyLoNet for video manipulation and editing by a novel class label modification architecture.

Related Material

[pdf] [code]
@InProceedings{Ben_Zikri_2022_ACCV, author = {Ben Zikri, Nir and Sharf, Andrei}, title = {PhyLoNet: Physically-Constrained Long Term Video Prediction}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {877-893} }