Spatiotemporal Feature Residual Propagation for Action Prediction

He Zhao, Richard P. Wildes; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7003-7012

Abstract


Recognizing actions from limited preliminary video observations has seen considerable recent progress. Typically, however, such progress has been had without explicitly modeling fine-grained motion evolution as a potentially valuable information source. In this study, we address this task by investigating how action patterns evolve over time in a spatial feature space. There are three key components to our system. First, we work with intermediate-layer ConvNet features, which allow for abstraction from raw data, while retaining spatial layout, which is sacrificed in approaches that rely on vectorized global representations. Second, instead of propagating features per se, we propagate their residuals across time, which allows for a compact representation that reduces redundancy while retaining essential information about evolution over time. Third, we employ a Kalman filter to combat error build-up and unify across prediction start times. Extensive experimental results on the JHMDB21, UCF101 and BIT datasets show that our approach leads to a new state-of-the-art in action prediction.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhao_2019_ICCV,
author = {Zhao, He and Wildes, Richard P.},
title = {Spatiotemporal Feature Residual Propagation for Action Prediction},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}