Attending to Distinctive Moments: Weakly-Supervised Attention Models for Action Localization in Video

Lei Chen, Mengyao Zhai, Greg Mori; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 328-336

Abstract


We present a method for utilizing weakly supervised data for action localization in videos. We focus on sports video analysis, where videos contain scenes of multiple people. Weak supervision gathered from sports website is provided in the form of an action taking place in a video clip, without specification of the person performing the action. Since many frames of a clip can be ambiguous, a novel temporal attention approach is designed to select the most distinctive frames in which to apply the weak supervision. Empirical results demonstrate that leveraging weak supervision can build upon purely supervised localization methods, and utilizing temporal attention further improves localization accuracy.

Related Material


[pdf]
[bibtex]
@InProceedings{Chen_2017_ICCV,
author = {Chen, Lei and Zhai, Mengyao and Mori, Greg},
title = {Attending to Distinctive Moments: Weakly-Supervised Attention Models for Action Localization in Video},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}