Efficient Two-Stream Action Recognition on FPGA

Jia-Ming Lin, Kuan-Ting Lai, Bin-Ray Wu, Ming-Syan Chen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 3076-3080


Action recognition is an important research field that has many applications in surveillance, video search, autonomous vehicles, etc. However, current state-of-the-art action classifiers are still not widely adopted in embedded applications yet. The major reason is that action recognition needs to process both spatial and temporal streaming data to precisely identify actions, which is compute-intensive and power hungry. To solve this issue, researchers start using FPGA to run action recognition models with minimum power. In this paper, we propose a new hardware architecture of action recognition on FPGA. Our model is based on the popular two-stream neural network. By optimizing the optical flow and convolution operations in the temporal domain, our method can achieve similar accuracy with one order of magnitude less operations than other C3D baseline models. We have implemented our model on Xilinx ZCU102 and released the source code.

Related Material

@InProceedings{Lin_2021_CVPR, author = {Lin, Jia-Ming and Lai, Kuan-Ting and Wu, Bin-Ray and Chen, Ming-Syan}, title = {Efficient Two-Stream Action Recognition on FPGA}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {3076-3080} }