-
[pdf]
[arXiv]
[bibtex]@InProceedings{Bafghi_2025_WACV, author = {Bafghi, Reza Akbarian and Harilal, Nidhin and Raissi, Maziar and Monteleoni, Claire}, title = {MixDiff: Mixing Natural and Synthetic Images for Robust Self-Supervised Representations}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {7500-7500} }
MixDiff: Mixing Natural and Synthetic Images for Robust Self-Supervised Representations
Abstract
This paper introduces MixDiff a new self-supervised learning (SSL) pre-training framework that combines real and synthetic images. Unlike traditional SSL methods that predominantly use real images MixDiff uses a variant of Stable Diffusion to replace an augmented instance of a real image facilitating the learning of cross real-synthetic image representations. Our key insight is that while models trained solely on synthetic images underperform combining real and synthetic data leads to more robust and adaptable representations. Experiments show MixDiff enhances SimCLR BarlowTwins and DINO across various robustness datasets and domain transfer tasks boosting SimCLR's ImageNet-1K accuracy by 4.56%. Our framework also demonstrates comparable performance without needing any augmentations a surprising finding in SSL where augmentations are typically crucial. Furthermore MixDiff achieves similar results to SimCLR while requiring less real data highlighting its efficiency in representation learning
Related Material