Convolutional Recurrent Neural Networks: Learning Spatial Dependencies for Image Representation

Zhen Zuo, Bing Shuai, Gang Wang, Xiao Liu, Xingxing Wang, Bing Wang, Yushi Chen; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2015, pp. 18-26

Abstract


In existing convolutional neural networks (CNNs), both convolution and pooling are locally performed for image regions separately, no contextual dependencies between different image regions have been taken into consideration. Such dependencies represent useful spatial structure information in images. Whereas recurrent neural networks (RNNs) are designed for learning contextual dependencies among sequential data by using the recurrent (feedback) connections. In this work, we propose the convolutional recurrent neural network (C-RNN), which learns the spatial dependencies between image regions to enhance the discriminative power of image representation. The C-RNN is trained in an end-to-end manner from raw pixel images. CNN layers are firstly processed to generate middle level features. RNN layer is then learned to encode spatial dependencies. The C-RNN can learn better image representation, especially for images with obvious spatial contextual dependencies. Our method achieves competitive performance on ILSVRC 2012, SUN 397, and MIT indoor.

Related Material


[pdf]
[bibtex]
@InProceedings{Zuo_2015_CVPR_Workshops,
author = {Zuo, Zhen and Shuai, Bing and Wang, Gang and Liu, Xiao and Wang, Xingxing and Wang, Bing and Chen, Yushi},
title = {Convolutional Recurrent Neural Networks: Learning Spatial Dependencies for Image Representation},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2015}
}