Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation

Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim, Junghyun Cho, Kwanghoon Sohn; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 14163-14172

Abstract


Existing techniques to encode spatial invariance within deep convolutional neural networks only model 2D transformation fields. This does not account for the fact that objects in a 2D space are a projection of 3D ones, and thus they have limited ability to severe object viewpoint changes. To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space. CCNs extract a view-specific feature through a view-specific convolutional kernel to predict object category scores at each viewpoint. With the view-specific feature, we simultaneously determine objective category and viewpoints using the proposed sinusoidal soft-argmax module. Our experiments demonstrate the effectiveness of the cylindrical convolutional networks on joint object detection and viewpoint estimation.

Related Material


[pdf] [arXiv] [video]
[bibtex]
@InProceedings{Joung_2020_CVPR,
author = {Joung, Sunghun and Kim, Seungryong and Kim, Hanjae and Kim, Minsu and Kim, Ig-Jae and Cho, Junghyun and Sohn, Kwanghoon},
title = {Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}