-
[pdf]
[code]
[bibtex]@InProceedings{Ben_Zikri_2022_ACCV, author = {Ben Zikri, Nir and Sharf, Andrei}, title = {PhyLoNet: Physically-Constrained Long Term Video Prediction}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {877-893} }
PhyLoNet: Physically-Constrained Long Term Video Prediction
Abstract
Motions in videos are often governed by physical and biological
laws such as gravity, collisions, flocking, etc. Accounting for such
natural properties is an appealing way to improve realism in future frame
video prediction. Nevertheless, the definition and computation of intricate
physical and biological properties in motion videos are challenging.
In this work, we introduce PhyLoNet, a PhyDNet extension that learns
long-term future frame prediction and manipulation. Similar to PhyDNet,
our network consists of a two-branch deep architecture that explicitly
disentangles physical dynamics from complementary information.
It uses a recurrent physical cell (PhyCell) for performing physicallyconstrained
prediction in latent space. In contrast to PhyDNet, Phy-
LoNet introduces a modified encoder-decoder architecture together with
a novel relative flow loss. This enables a longer-term future frame prediction
from a small input sequence with higher accuracy and quality.
We have carried out extensive experiments, showing the ability of Phy-
LoNet to outperform PhyDNet on various challenging natural motion
datasets such as ball collisions, flocking, and pool games. Ablation studies
highlight the importance of our new components. Finally, we show an
application of PhyLoNet for video manipulation and editing by a novel
class label modification architecture.
Related Material