Enhancing Temporal Action Localization with Transfer Learning from Action Recognition

Ahsan Iqbal, Alexander Richard, Juergen Gall; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 0-0

Abstract


Temporal localization of actions in videos has been of increasing interest in recent years. However, most existing approaches rely on complex architectures that are either expensive to train, inefficient at inference time, or require thorough and careful architecture engineering. Classical action recognition on pre-segmented clips, on the other hand, benefits from sophisticated deep architectures that paved the way for highly reliable video clip classifiers. In this paper, we propose to use transfer learning to leverage the good results from action recognition for temporal localization. We apply a network that is inspired by the classical bag-of-words model for transfer learning and show that the resulting framewise class posteriors already provide good results without explicit temporal modeling. Further, we show that combining these features with a deep but simple convolutional network achieves state of the art results on two challenging action localization datasets.

Related Material


[pdf]
[bibtex]
@InProceedings{Iqbal_2019_ICCV,
author = {Iqbal, Ahsan and Richard, Alexander and Gall, Juergen},
title = {Enhancing Temporal Action Localization with Transfer Learning from Action Recognition},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2019}
}