Building Detail-Sensitive Semantic Segmentation Networks With Polynomial Pooling

Wei, Zhen; Zhang, Jingyi; Liu, Li; Zhu, Fan; Shen, Fumin; Zhou, Yi; Liu, Si; Sun, Yao; Shao, Ling

Zhen Wei, Jingyi Zhang, Li Liu, Fan Zhu, Fumin Shen, Yi Zhou, Si Liu, Yao Sun, Ling Shao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7115-7123

Abstract

Semantic segmentation is an important computer vision task, which aims to allocate a semantic label to each pixel in an image. When training a segmentation model, it is common to fine-tune a classification network pre-trained on a large-scale dataset. However, as an intrinsic property of the classification model, invariance to spatial perturbation resulting from the lose of detail-sensitivity prevents segmentation networks from achieving high performance. The use of standard poolings is one of the key factors for this invariance. The most common standard poolings are max and average pooling. Max pooling can increase both the invariance to spatial perturbations and the non-linearity of the networks. Average pooling, on the other hand, is sensitive to spatial perturbations, but is a linear function. For semantic segmentation, we prefer both the preservation of detailed cues within a local feature region and non-linearity that increases a network's functional complexity. In this work, we propose a polynomial pooling (P-pooling) function that finds an intermediate form between max and average pooling to provide an optimally balanced and self-adjusted pooling strategy for semantic segmentation. The P-pooling is differentiable and can be applied into a variety of pre-trained networks. Extensive studies on the PASCAL VOC, Cityscapes and ADE20k datasets demonstrate the superiority of P-pooling over other poolings. Experiments on various network architectures and state-of-the-art training strategies also show that models with P-pooling layers consistently outperform those directly fine-tuned using pre-trained classification models.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Wei_2019_CVPR,
author = {Wei, Zhen and Zhang, Jingyi and Liu, Li and Zhu, Fan and Shen, Fumin and Zhou, Yi and Liu, Si and Sun, Yao and Shao, Ling},
title = {Building Detail-Sensitive Semantic Segmentation Networks With Polynomial Pooling},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}