# Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
<br>
<p align="center">
<img src="figure/teaser.jpg" width="90%"/>  

We present an automated text animation scheme, termed "Dynamic Typography," which combines two challenging tasks. It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.

## Requirements:
All our animation samples are generated with a single H800 GPU with 80GB VRAM. To generate a text animation with 20 or more frames, a GPU with at least 24GB VRAM is required.

## Environment
All the tests are conducted in Linux. We suggest running our code in Linux. To set up our environment in Linux, please run:
```
conda env create -f environment.yml
```
Next, you need to manually install diffvg.

## Generate Your Animation!
To animate a letter within a word, run the following command:
```
CUDA_VISIBLE_DEVICES=0 python dynamicTypography.py \
        --word "<The Word>" \
        --optimized_letter "<The letter to be animated>" \
        --caption "<The prompt that describes the animation>" \
        --use_xformer --canonical --anneal \
        --use_perceptual_loss --use_conformal_loss \
        --use_transition_loss
```
For example:
```
CUDA_VISIBLE_DEVICES=0 python dynamicTypography.py \
        --word "HARMONY" \
        --optimized_letter "H" \
        --caption "Two men shaking hands with each other in a friendly manner" \
        --use_xformer --canonical --anneal \
        --use_perceptual_loss --use_conformal_loss \
        --use_transition_loss
```

The output animation will be saved to `videos`. The output includes the network's weights, SVG frame logs, and their rendered .mp4 files (under svg_logs and mp4_logs respectively). We save both the in-context and the sole letter animation.
At the end of training, we output a high-quality gif render of the last iteration (HG_gif.gif). <br>

We provide many sample run scripts in `scripts`, the expected resulting gifs are in `example_gifs`. The sample result of the above `HARMONY` animation, including the intermediate results every 100 epochs, are stored in the `videos` folder.

## Tips:

By default, a 24-frame video will be generated, requiring about 28GB of VRAM. If there is not enough VRAM available, the number of frames can be reduced by using the `--num_frames` parameter.

If your animation remains the same with/deviates too much from the original letter's shape, please set a lower/higher `--perceptual_weight`.

If you want the animation to be less/more geometrically similar to the original letter, please set a lower/higher `--angles_w`.

If you want to further enforce appearance consistency between frames, please set a higher `--transition_weight`. But please keep in mind that this will reduce the motion amplitude.

Small visual artifacts can often be fixed by changing the `--seed`.
