Temporal Context Network for Activity Localization in Videos

Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen; The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5793-5802

Abstract


We present a Temporal Context Network (TCN) for precise temporal localization of human activities. Similar to the Faster-RCNN architecture, proposals are placed at equal intervals in a video which span multiple temporal scales. We propose a novel representation for ranking these proposals. Since pooling features only inside a segment is not sufficient to predict activity boundaries, we construct a representation which explicitly captures context around a proposal for ranking it. For each temporal segment inside a proposal, features are uniformly sampled at a pair of scales and are input to a temporal convolutional neural network for classification. After ranking proposals, non-maximum suppression is applied and classification is performed to obtain final detections. TCN outperforms state-of-the-art methods on the ActivityNet dataset and the THUMOS14 dataset.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Dai_2017_ICCV,
author = {Dai, Xiyang and Singh, Bharat and Zhang, Guyue and Davis, Larry S. and Qiu Chen, Yan},
title = {Temporal Context Network for Activity Localization in Videos},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}