## Learning Class Trajectory from Noisy Label for Diffusion Models without Purifying Data (LCT)<br><sub>Official PyTorch implementation</sub>

![Teaser image](./docs/dg-edm.png)

**Guiding Noisy Label Conditional Diffusion Models with Score-based Discriminator Correction**<br>

Abstract: *Diffusion models have gained prominence as state-of-the-art techniques for synthesizing images and videos, particularly due to their ability to scale effectively with large datasets. Recent studies have uncovered that these extensive datasets often contain mistakes from human labeling processes. However, the extent to which such errors compromise the generative capabilities and controllability of diffusion models is ot well studied. This paper proposes Score-based Discriminator Correction (SBDC), a guidance technique to align the noisy pre-trained conditional diffusion models. The guidance is constructed from discriminator training with the adversarial loss, which leverages previous works in noise detection methods to determine the realness of each sample. Our method does not require retraining the diffusion model, is computationally efficient, and only marginally increases the inference time. Experiments on different noise settings demonstrate the superior performance of our proposed method compared to previous state-of-the-art methods.*

## Requirements

* Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
* 1+ high-end NVIDIA GPU for sampling. We have done all testing and development using  A100 GPUs.
* 64-bit Python 3.8 and PyTorch 1.12.0 (or later). See https://pytorch.org for PyTorch install instructions.
* Python libraries: See [environment.yml](./environment.yml) for exact library dependencies. You can use the following commands with Miniconda3 to create and activate your Python environment:
  - `conda env create -f environment.yml -n edm`
  - `conda activate edm`
* Docker users:
  - Ensure you have correctly installed the [NVIDIA container runtime](https://docs.docker.com/config/containers/resource_constraints/#gpu).
  - Use the [provided Dockerfile](./Dockerfile) to build an image with the required library dependencies.


## Pre-trained models

We provide pre-trained models for our noise configuration:

[EDM and Discriminator Checkpoint](https://docs.docker.com/config/containers/resource_constraints/#gpu)

To generate a batch of images using a given model and sampler, run:

```.bash
# Generate 64 images and save them as out/*.png
python generate.py --outdir=out --seeds=0-63 --batch=64 \
    --network=</path/to/pkl>
# Generate 64 images with SBDC and save them as out/*.png
python generate.py --outdir=out --seeds=0-63 --batch=64 \
    --network=</path/to/pkl> --discriminator=</path/to/discriminator/pkl>
```

Generating a large number of images for EDM with SBDC:

```.bash
# Generate 1024 images using 2 GPUs with SBDC
torchrun --standalone --nproc_per_node=2 generate.py --outdir=out --seeds=0-999 --batch=64 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl \
    --discriminator=</path/to/discriminator/pkl> --S_clip_min 1.5 --S_clip_max 50 \
    --dg_weight_1st_order 0.9 --dg_weight_2nd_order 0.9
```


## Calculating FID

To compute Fr&eacute;chet inception distance (FID) for a given model and sampler, first generate 50,000 random images and then compare them against the dataset reference statistics using `evaluator_fmiprdc.py`:

```.bash
# Generate 50000 images and save them as fid-tmp/*/*.png
torchrun --standalone --nproc_per_node=1 generate.py --outdir=fid-tmp --seeds=0-49999 --subdirs \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl

# Calculate FID
cd evaluation_utils
python evaluator_fmiprdc.py <ref/image/path> <sample/image/path>
```

The second command typically takes 1-3 minutes in practice, but the first one can sometimes take several hours, depending on the configuration. See [`README.md`](./evaluation_utils/README.md) for the full list of options.


## Preparing datasets

The instruction can be found in the [`README.md`](./data_utils/README.md)

## Training new models

You can train new models using `train.py`. For example:

```.bash
# Train DDPM++ model for class-conditional CIFAR-10 using 8 GPUs
torchrun --standalone --nproc_per_node=8 train.py --outdir=training-runs \
    --data=datasets/cifar10-32x32.zip --cond=1 --arch=ddpmpp
```

## Training new discriminator models

You can train new models using `train_discriminator.py`. For example:

```.bash
# Train DDPM++ model for class-conditional CIFAR-10 using 1 GPUs
torchrun --standalone --nproc_per_node=1 train_discriminator.py --outdir=discriminator-runs \
    --data=datasets/cifar10_sym_40-32x32.zip --cond=1 --arch=ddpmpp --batch 1024
```

```.bash
# Train DDPM++ model for class-conditional CIFAR-10 with SiMix using 1 GPUs
torchrun --standalone --nproc_per_node=1 train_discriminator.py --outdir=discriminator-runs \
    --data=datasets/cifar10_sym_40-32x32.zip --cond=1 --arch=ddpmpp --batch 1024 --simix 1
```


The above example uses the default batch size of 512 images (controlled by `--batch`) that is divided evenly among 8 GPUs (controlled by `--nproc_per_node`) to yield 64 images per GPU. Training large models may run out of GPU memory; the best way to avoid this is to limit the per-GPU batch size, e.g., `--batch-gpu=32`. This employs gradient accumulation to yield the same results as using full per-GPU batches. See [`python train.py --help`](./docs/train-help.txt) for the full list of options.

The results of each training run are saved to a newly created directory, for example `training-runs/00000-cifar10-cond-ddpmpp-edm-gpus8-batch64-fp32`. The training loop exports network snapshots (`network-snapshot-*.pkl`) and training states (`training-state-*.pt`) at regular intervals (controlled by `--snap` and `--dump`). The network snapshots can be used to generate images with `generate.py`, and the training states can be used to resume the training later on (`--resume`). Other useful information is recorded in `log.txt` and `stats.jsonl`. To monitor training convergence, we recommend looking at the training loss (`"Loss/loss"` in `stats.jsonl`) as well as periodically evaluating FID for `network-snapshot-*.pkl` using `generate.py` and `fid.py`.

## License

Copyright &copy; 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

All material, including source code and pre-trained models, is licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/).

## Acknowledgements

This work is heavily built upon the code from:
* [Karras, T., Aittala, M., Aila, T., & Laine, S. (2022). Elucidating the design space of diffusion-based generative models. *Advances in Neural Information Processing Systems, 35, 26565-26577*.](https://github.com/NVlabs/edm)

## Citation

```
@inproceedings{Karras2022edm,
  author    = {Tero Karras and Miika Aittala and Timo Aila and Samuli Laine},
  title     = {Elucidating the Design Space of Diffusion-Based Generative Models},
  booktitle = {Proc. NeurIPS},
  year      = {2022}
}
```

## Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

