MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization

Seulgi Jeong, Jaeil Kim; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 6981-6990

Abstract


In the personalization process of large-scale text-to-image models, overfitting often occurs when learning specific subject from a limited number of images. Existing methods, such as DreamBooth, mitigate this issue through a class-specific prior-preservation loss, which requires increased computational cost during training and limits user control during inference time. To address these limitations, we propose Mask-Integrated Negative Attention Diffusion (MINDiff). MINDiff introduces a novel concept, negative attention, which suppresses the subject's influence in masked irrelevant regions. We achieve this by modifying the cross-attention mechanism during inference. This enables semantic control and improves text alignment by reducing subject dominance in irrelevant regions. Additionally, during the inference time, users can adjust a scale parameter \lambda to balance subject fidelity and text alignment. Our qualitative and quantitative experiments on DreamBooth models demonstrate that MINDiff mitigates overfitting more effectively than class-specific prior-preservation loss. As our method operates entirely at inference time and does not alter the model architecture, it can be directly applied to existing DreamBooth models without re-training.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Jeong_2025_ICCV, author = {Jeong, Seulgi and Kim, Jaeil}, title = {MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {6981-6990} }