Rethinking 360deg Image Visual Attention Modelling With Unsupervised Learning.

Yasser Abdelaziz Dahou Djilali, Tarun Krishna, Kevin McGuinness, Noel E. O’Connor; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15414-15424

Abstract


Despite the success of self-supervised representation learning on planar data, to date it has not been studied on 360deg images. In this paper, we extend recent advances in contrastive learning to learn latent representations that are sufficiently invariant to be highly effective for spherical saliency prediction as a downstream task. We argue that omni-directional images are particularly suited to such an approach due to the geometry of the data domain. To verify this hypothesis, we design an unsupervised framework that effectively maximizes the mutual information between the different views from both the equator and the poles. We show that the decoder is able to learn good quality saliency distributions from the encoder embeddings. Our model compares favorably with fully-supervised learning methods on the Salient360!, VR-EyeTracking and Sitzman datasets. This performance is achieved using an encoder that is trained in a completely unsupervised way and a relatively lightweight supervised decoder (3.8 X fewer parameters in the case of the ResNet50 encoder). We believe that this combination of supervised and unsupervised learning is an important step toward flexible formulations of human visual attention.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Djilali_2021_ICCV, author = {Djilali, Yasser Abdelaziz Dahou and Krishna, Tarun and McGuinness, Kevin and O{\textquoteright}Connor, Noel E.}, title = {Rethinking 360deg Image Visual Attention Modelling With Unsupervised Learning.}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {15414-15424} }