Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces

ICCV 2025

Paper ID 6728

In this supplementary website we provide additional video results for the following cases:

  • First row: videos showing reconstruction results under different settings.
  • Second row: videos showing text-to-video geenrating results with different latent spaces.
  • Third row: videos corresponding to figures in the main paper.
Reconstruction Comparison
4X (Baselines)
Reconstruction Comparison
8X
Reconstruction Comparison
16X
Reconstruction Comparison
Overlapping Chunks

Text-to-Video Generation
16X Latent
Text-to-Video Generation
4X v/s 16X Latent
Text-to-Video Generation
16X Latent Long Video
Text-to-Video Generation
Overlapping Chunks

Figure 3
Main Paper
Figure 2
Main Paper
Figure 5
Main Paper
Figure 9
Main Paper

Reconstruction Comparison (8× Temporal Compression)

We show a comparison of reconstruction results of directly extending MagViTv2 to 8× temporal compression against our method ProMAG at 8× with our progressive growing approach. We also compare with Cosmos-CV-8×. ProMAG at 8× temporal compression has much more sharper reconstructions and not not contain blurriness observed in the reconstruction results of Cosmos-CV at 8× temporal compression at 16 channel latent space (zdim=16). Similarly, ProMAG at 8× temporal compression has much more accurate reconstructions and not not contain artifacts observed directly extending MagViTv2 to 8× temporal compression at 8 channel latent space (zdim=8).

16 channel Latent (zdim=16)

Ground-Truth

MagViTv2-8× (zdim=16)

Cosmos-CV-8× (zdim=16)

ProMAG-8× (zdim=16)


Ground-Truth

MagViTv2-8× (zdim=16)

Cosmos-CV-8× (zdim=16)

ProMAG-8× (zdim=16)


Ground-Truth

MagViTv2-8× (zdim=16)

Cosmos-CV-8× (zdim=16)

ProMAG-8× (zdim=16)


Ground-Truth

MagViTv2-8× (zdim=16)

Cosmos-CV-8× (zdim=16)

ProMAG-8× (zdim=16)


Ground-Truth

MagViTv2-8× (zdim=16)

Cosmos-CV-8× (zdim=16)

ProMAG-8× (zdim=16)


Ground-Truth

MagViTv2-8× (zdim=16)

Cosmos-CV-8× (zdim=16)

ProMAG-8× (zdim=16)


Ground-Truth

MagViTv2-8× (zdim=16)

Cosmos-CV-8× (zdim=16)

ProMAG-8× (zdim=16)




8 channel Latent (zdim=8)

Ground-Truth

MagViTv2-8× (zdim=8)

ProMAG-8× (zdim=8)


Ground-Truth

MagViTv2-8× (zdim=8)

ProMAG-8× (zdim=8)


Ground-Truth

MagViTv2-8× (zdim=8)

ProMAG-8× (zdim=8)


Ground-Truth

MagViTv2-8× (zdim=8)

ProMAG-8× (zdim=8)


Ground-Truth

MagViTv2-8× (zdim=8)

ProMAG-8× (zdim=8)


Ground-Truth

MagViTv2-8× (zdim=8)

ProMAG-8× (zdim=8)


Ground-Truth

MagViTv2-8× (zdim=8)

ProMAG-8× (zdim=8)