Three Birds with One Stone: Multi-Task Temporal Action Detection via Recycling Temporal Annotations
Temporal action detection on unconstrained videos has seen significant research progress in recent years. Deep learning has achieved enormous success in this direction. However, collecting large-scale temporal detection datasets to ensuring promising performance in the real-world is a laborious, impractical and time consuming process. Accordingly, we present a novel improved temporal action localization model that is better able to take advantage of limited labeled data available. Specifically, we design two auxiliary tasks by reconstructing the available label information and then facilitate the learning of the temporal action detection model. Each task generates their supervision signal by recycling the original annotations, and are jointly trained with the temporal action detection model in a multi-task learning fashion. Note that the proposed approach can be pluggable to any region proposal based temporal action detection models. We conduct extensive experiments on three benchmark datasets, namely THUMOS'14, Charades and ActivityNet. Our experimental results confirm the effectiveness of the proposed model.