Unrolled Memory Inner-Products: An Abstract GPU Operator for Efficient Vision-Related Computations

Lin, Yu-Sheng; Chen, Wei-Chao; Chien, Shao-Yi

Unrolled Memory Inner-Products: An Abstract GPU Operator for Efficient Vision-Related Computations

Yu-Sheng Lin, Wei-Chao Chen, Shao-Yi Chien; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 4577-4585

Abstract

Recently, convolutional neural networks (CNNs) have achieved great success in fields such as computer vision, natural language processing, and artificial intelligence. Many of these applications utilize parallel processing in GPUs to achieve higher performance. However, it remains a daunting task to optimize for GPUs, and most researchers have to rely on vendor-provided libraries for such purposes. In this paper, we discuss an operator that can be used to succinctly express computational kernels in CNNs and various scientific and vision applications. This operator, called Unrolled-Memory-Inner-Product (UMI), is a computationally-efficient operator with smaller code token requirement. Since a naive UMI implementation would increase memory requirement through input data unrolling, we propose a method to achieve optimal memory fetch performance in modern GPUs. We demonstrate this operator by converting several popular applications into the UMI representation and achieve 1.3x-26.4x speedup against frameworks such as OpenCV and Caffe.

Related Material

[pdf] [video]

[bibtex]

@InProceedings{Lin_2017_ICCV,
author = {Lin, Yu-Sheng and Chen, Wei-Chao and Chien, Shao-Yi},
title = {Unrolled Memory Inner-Products: An Abstract GPU Operator for Efficient Vision-Related Computations},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}