Efficient Two-Stream Action Recognition on FPGA
Action recognition is an important research field that has many applications in surveillance, video search, autonomous vehicles, etc. However, current state-of-the-art action classifiers are still not widely adopted in embedded applications yet. The major reason is that action recognition needs to process both spatial and temporal streaming data to precisely identify actions, which is compute-intensive and power hungry. To solve this issue, researchers start using FPGA to run action recognition models with minimum power. In this paper, we propose a new hardware architecture of action recognition on FPGA. Our model is based on the popular two-stream neural network. By optimizing the optical flow and convolution operations in the temporal domain, our method can achieve similar accuracy with one order of magnitude less operations than other C3D baseline models. We have implemented our model on Xilinx ZCU102 and released the source code.