Raising the Bar of AI-generated Image Detection with CLIP

Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, Luisa Verdoliva; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 4356-4366

Abstract


The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios. We find that contrary to previous beliefs it is neither necessary nor convenient to use a large domain-specific dataset for training. On the contrary by using only a handful of example images from a single generative model a CLIP-based detector exhibits surprising generalization ability and high robustness across different architectures including recent commercial tools such as Dalle-3 Midjourney v5 and Firefly. We match the state-of-the-art (SoTA) on in-distribution data and significantly improve upon it in terms of generalization to out-of-distribution data (+6% AUC) and robustness to impaired/laundered data (+13%). Our project is available on-line at https://grip-unina.github.io/ClipBased-SyntheticImageDetection/

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Cozzolino_2024_CVPR, author = {Cozzolino, Davide and Poggi, Giovanni and Corvi, Riccardo and Nie{\ss}ner, Matthias and Verdoliva, Luisa}, title = {Raising the Bar of AI-generated Image Detection with CLIP}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {4356-4366} }