# Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

## Environment
```shell
conda create -n arldm python=3.8
conda activate arldm
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch-lts
cd ARLDM
pip install -r requirements.txt
```
## Data Preparation
* Download the PororoSV dataset [here](https://drive.google.com/file/d/11Io1_BufAayJ1BpdxxV2uJUvCcirbrNc/view?usp=sharing).
* Download the FlintstonesSV dataset [here](https://drive.google.com/file/d/1kG4esNwabJQPWqadSDaugrlF4dRaV33_/view?usp=sharing).
* Download the VIST-SIS url links [here](https://visionandlanguage.net/VIST/json_files/story-in-sequence/SIS-with-labels.tar.gz)
* Download the VIST-DII url links [here](https://visionandlanguage.net/VIST/json_files/description-in-isolation/DII-with-labels.tar.gz)
* Download the VIST images running
```shell
python data_script/vist_img_download.py
--json_dir /path/to/dii_json_files
--img_dir /path/to/save_images
--num_process 32
```
* To accelerate I/O, using the following scrips to convert your downloaded data to HDF5
```shell
python data_script/pororo_hdf5.py
--data_dir /path/to/pororo_data
--save_path /path/to/save_hdf5_file

python data_script/flintstones_hdf5.py
--data_dir /path/to/flintstones_data
--save_path /path/to/save_hdf5_file

python data_script/vist_hdf5.py
--sis_json_dir /path/to/sis_json_files
--dii_json_dir /path/to/dii_json_files
--img_dir /path/to/vist_images
--save_path /path/to/save_hdf5_file
```

## Training
Specify your directory and device configuration in `config.yaml` and run
```shell
python main.py
```
## Sample
Specify your directory and device configuration in `config.yaml` and run
```shell
python main.py
```

