ST-RoomNet: Learning Room Layout Estimation From Single Image Through Unsupervised Spatial Transformations
Room layout estimation is an important task for the 3D reconstruction of indoor scenes and augmented reality applications. The layout of the room is usually estimated by predicting the keypoints of the room corners, room planer segmentation (floor, ceiling, right, left, and front walls), or line detection. In this paper, we propose a novel way to estimate the room layout from monocular RGB images using spatial transformation networks (STN). Since it is commonly known that the room has a cuboid layout, we train a convolutional neural network to predict unsupervised perspective transformation parameters that can transform a reference cuboid layout to the required room layout based on the deep features of the input room image. We show that the proposed method is simple and efficient in learning the room layout without the need to perform segmentation, line detection, or keypoint estimation. We test the proposed method on two challenging benchmarks; LSUN Room Layout and Hedau dataset showing that our method can achieve pixel accuracy error of 5.24% on LSUN and 7.10% on Hedau at a speed of (10 15 fps) outperforming the state-of-the-art methods in room layout estimation task.