What Synthesis Is Missing: Depth Adaptation Integrated With Weak Supervision for Indoor Scene Parsing

Keng-Chi Liu, Yi-Ting Shen, Jan P. Klopp, Liang-Gee Chen; The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 7345-7354

Abstract


Scene Parsing is a crucial step to enable autonomous systems to understand and interact with their surroundings. Supervised deep learning methods have made great progress in solving scene parsing problems, however, come at the cost of laborious manual pixel-level annotation. Synthetic data as well as weak supervision have been investigated to alleviate this effort. Nonetheless, synthetically generated data still suffers from severe domain shift while weak labels often lack precision. Moreover, most existing works for weakly supervised scene parsing are limited to salient foreground objects. The aim of this work is hence twofold: Exploit synthetic data where feasible and integrate weak supervision where necessary. More concretely, we address this goal by utilizing depth as transfer domain because its synthetic-to-real discrepancy is much lower than for color. At the same time, we perform weak localization from easily obtainable image level labels and integrate both using a novel contour-based scheme. Our approach is implemented as a teacher-student learning framework to solve the transfer learning problem by generating a pseudo ground truth. Using only depth-based adaptation, this approach already outperforms previous transfer learning approaches on the popular indoor scene parsing SUN RGB-D dataset. Our proposed two-stage integration more than halves the gap towards fully supervised methods when compared to previous state-of-the-art in transfer learning.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Liu_2019_ICCV,
author = {Liu, Keng-Chi and Shen, Yi-Ting and Klopp, Jan P. and Chen, Liang-Gee},
title = {What Synthesis Is Missing: Depth Adaptation Integrated With Weak Supervision for Indoor Scene Parsing},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}