Pulling Things out of Perspective

Lubor Ladicky, Jianbo Shi, Marc Pollefeys; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 89-96


The limitations of current state-of-the-art methods for single-view depth estimation and semantic segmentations are closely tied to the property of perspective geometry, that the perceived size of the objects scales inversely with the distance. In this paper, we show that we can use this property to reduce the learning of a pixel-wise depth classifier to a much simpler classifier predicting only the likelihood of a pixel being at an arbitrarily fixed canonical depth. The likelihoods for any other depths can be obtained by applying the same classifier after appropriate image manipulations. Such transformation of the problem to the canonical depth removes the training data bias towards certain depths and the effect of perspective. The approach can be straight-forwardly generalized to multiple semantic classes, improving both depth estimation and semantic segmentation performance by directly targeting the weaknesses of independent approaches. Conditioning the semantic label on the depth provides a way to align the data to their physical scale, allowing to learn a more discriminative classifier. Conditioning depth on the semantic class helps the classifier to distinguish between ambiguities of the otherwise ill-posed problem. We tested our algorithm on the KITTI road scene dataset and NYU2 indoor dataset and obtained obtained results that significantly outperform current state-of-the-art in both single-view depth and semantic segmentation domain.

Related Material

author = {Ladicky, Lubor and Shi, Jianbo and Pollefeys, Marc},
title = {Pulling Things out of Perspective},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2014}