LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization

Sheng Liu, Cong Phuoc Huynh, Cong Chen, Maxim Arap, Raffay Hamid; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 18290-18299

Abstract


We present a simple yet effective self-supervised pretraining method for image harmonization which can leverage large-scale unannotated image datasets. To achieve this goal, we first generate pre-training data online with our Label-Efficient Masked Region Transform (LEMaRT) pipeline. Given an image, LEMaRT generates a foreground mask and then applies a set of transformations to perturb various visual attributes, e.g., defocus blur, contrast, saturation, of the region specified by the generated mask. We then pre-train image harmonization models by recovering the original image from the perturbed image. Secondly, we introduce an image harmonization model, namely SwinIH, by retrofitting the Swin Transformer [27] with a combination of local and global self-attention mechanisms. Pretraining SwinIH with LEMaRT results in a new state of the art for image harmonization, while being label-efficient, i.e., consuming less annotated data for fine-tuning than existing methods. Notably, on iHarmony4 dataset [8], SwinIH outperforms the state of the art, i.e., SCS-Co [16] by a margin of 0.4 dB when it is fine-tuned on only 50% of the training data, and by 1.0 dB when it is trained on the full training dataset.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Liu_2023_CVPR, author = {Liu, Sheng and Huynh, Cong Phuoc and Chen, Cong and Arap, Maxim and Hamid, Raffay}, title = {LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {18290-18299} }