Spatial-Temporal Weighted Pyramid Using Spatial Orthogonal Pooling

Yusuke Mukuta, Yoshitaka Ushiku, Tatsuya Harada; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1041-1049

Abstract


Feature pooling is a method that summarizes local descriptors in an image using spatial information. Spatial pyramid matching uses the statistics of local features in an image subregion as a global feature. However, the disadvantages of this method are that there is no theoretical guideline for selecting the pooling region, robustness to small image translation is lost around the edges of the pooling region, the information encoded in the different feature pyramids overlaps, and thus recognition performance stagnates as a greater pyramid size is selected. In this research, we propose a novel interpretation that regards feature pooling as an orthogonal projection in the space of functions that maps the image space to the local feature space. Moreover, we propose a novel feature-pooling method that orthogonally projects the function form of local descriptors into the space of low-degree polynomials. Experimental results demonstrate the effectiveness of the proposed methods.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Mukuta_2017_ICCV,
author = {Mukuta, Yusuke and Ushiku, Yoshitaka and Harada, Tatsuya},
title = {Spatial-Temporal Weighted Pyramid Using Spatial Orthogonal Pooling},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}