CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images

Gokaslan, Aaron; Cooper, A. Feder; Collins, Jasmine; Seguin, Landan; Jacobson, Austin; Patel, Mihir; Frankle, Jonathan; Stephenson, Cory; Kuleshov, Volodymyr

Aaron Gokaslan, A. Feder Cooper, Jasmine Collins, Landan Seguin, Austin Jacobson, Mihir Patel, Jonathan Frankle, Cory Stephenson, Volodymyr Kuleshov; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 8250-8260

Abstract

We train a set of open text-to-image (T2I) diffusion models on a dataset of curated Creative-Commons-licensed (CC) images which yields models that are competitive with Stable Diffusion 2 (SD2). This task presents two challenges: (1) high-resolution CC images lack the captions necessary to train T2I models; (2) CC images are relatively scarce. To address these challenges we use an intuitive transfer learning technique to produce a set of high-quality synthetic captions paired with our assembled CC images. We then develop a data- and compute-efficient training recipe that requires as little as 3% of the LAION data (i.e. roughly 70 million examples) needed to train existing SD2 models but obtains the same quality. These results indicate that we have a sufficient number of CC images (also roughly 70 million) for training high-quality models. Our recipe also implements a variety of optimizations that achieve 2.71x training speed-ups enabling rapid model iteration. We leverage this recipe to train several high-quality T2I mod- els which we dub the CommonCanvas family. Our largest model achieves comparable performance to SD2 on human evaluation even though we use a synthetically captioned CC-image dataset that is only <3% the size of LAION for training. We release our models data and code on GitHub.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Gokaslan_2024_CVPR, author = {Gokaslan, Aaron and Cooper, A. Feder and Collins, Jasmine and Seguin, Landan and Jacobson, Austin and Patel, Mihir and Frankle, Jonathan and Stephenson, Cory and Kuleshov, Volodymyr}, title = {CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {8250-8260} }