- [pdf] [supp] [arXiv]
Jigsaw Clustering for Unsupervised Visual Representation Learning
Unsupervised representation learning with contrastive learning achieves great success recently. However, these methods have to duplicate each training batch to construct contrastive pairs, ie, each training batch and its augmented version should be forwarded simultaneously, leading to nearly double computation resource demand. We propose a novel Jigsaw Clustering pretext task in this paper, which only needs to forward each training batch itself, nearly reducing the training cost by a half. Our method makes use of information from both intra-image and inter-images, and outperforms previous single-batch based methods by a large margin, even comparable to the costly contrastive learning methods with only half the number of training batches. Our method shows that multiple batches during training are not necessary, and opens a new door for future research of single-batch based unsupervised methods. Our models trained on ImageNet datasets achieve state-of-the-art results with linear classification, outperform previous single-batch methods by 2.6%. Models transfer to COCO datasets outperforms MoCo v2 by 0.4% with only half the number of training samples. Our pretrained models outperform supervised ImageNet pretrained models on CIFAR-10 and CIFAR-100 datasets by 0.9% and 4.1% respectively.