CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment

Hyeongmin Lee, Kyoungkook Kang, Jungseul Ok, Sunghyun Cho; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 2942-2951

Abstract


Recent image tone adjustment (or enhancement) approaches have predominantly adopted supervised learning for learning human-centric perceptual assessment. However these approaches are constrained by intrinsic challenges of supervised learning. Primarily the requirement for expertly-curated or retouched images escalates the data acquisition expenses. Moreover their coverage of target styles is confined to stylistic variants inferred from the training data. To surmount the above challenges we propose an unsupervised learning-based approach for text-based image tone adjustment CLIPtone that extends an existing image enhancement method to accommodate natural language descriptions. Specifically we design a hyper-network to adaptively modulate the pretrained parameters of a backbone model based on a text description. To assess whether an adjusted image aligns with its text description without a ground-truth image we utilize CLIP which is trained on a vast set of language-image pairs and thus encompasses the knowledge of human perception. The major advantages of our approach are threefold: (i) minimal data collection expenses (ii) support for a range of adjustments and (iii) the ability to handle novel text descriptions unseen in training. The efficacy of the proposed method is demonstrated through comprehensive experiments including a user study.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Lee_2024_CVPR, author = {Lee, Hyeongmin and Kang, Kyoungkook and Ok, Jungseul and Cho, Sunghyun}, title = {CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {2942-2951} }