ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids

Dinesh Jayaraman, Ruohan Gao, Kristen Grauman; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 120-136

Abstract


We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation. The main idea is a self-supervised training objective that, given only a single 2D image, requires all unseen views of the object to be predictable from learned features. We implement this idea as an encoder-decoder convolutional neural network. The network maps an input image of an unknown category and unknown viewpoint to a latent space, from which a deconvolutional decoder can best “lift” the image to its complete viewgrid showing the object from all viewing angles. Our class-agnostic training procedure encourages the representation to capture fundamental shape primitives and semantic regularities in a data-driven manner—without manual semantic labels. Our results on two widely-used shape datasets show 1) our approach successfully learns to perform “mental rotation” even for objects unseen during training, and 2) the learned latent space is a powerful representation for object recognition, outperforming several existing unsupervised feature learning methods.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Jayaraman_2018_ECCV,
author = {Jayaraman, Dinesh and Gao, Ruohan and Grauman, Kristen},
title = {ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}