TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals

Jiyang Gao, Zhenheng Yang, Kan Chen, Chen Sun, Ram Nevatia; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3628-3636

Abstract


We address the problem of Temporal Action Proposal (TAP) generation. This is an important problem, as fast extraction of semantically important (e.g. human actions) segments from untrimmed videos is an important step for large-scale video analysis. To tackle this problem, we propose a novel Temporal Unit Regression Network (TURN) model. There are two salient aspects of TURN: (1) TURN jointly predicts action proposals and refines the temporal boundaries by temporal coordinate regression with contextual information; (2) Fast computation is enabled by unit feature reuse: a long untrimmed video is decomposed into video units, which are reused as basic building blocks of temporal proposals. TURN outperforms the state-of-the-art methods under average recall (AR) by a large margin on THUMOS-14 and ActivityNet datasets, and runs over 900 frames per second (FPS) on a TITAN X GPU. We further apply TURN as a proposal generation stage for existing temporal action localization pipelines, and outperforms state-of-the-art performance on THUMOS-14 and ActivityNet.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Gao_2017_ICCV,
author = {Gao, Jiyang and Yang, Zhenheng and Chen, Kan and Sun, Chen and Nevatia, Ram},
title = {TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}