PDAN: Pyramid Dilated Attention Network for Action Detection

Rui Dai, Srijan Das, Luca Minciullo, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 2970-2979


Handling long and complex temporal information is an important factor for action detection tasks. This challenge is further aggravated by densely distributed actions in untrimmed videos. Previous action detection methods are failing in selecting the key temporal information in videos of long length. To this end, we introduce the Dilated Attention Layer (DAL). Compared to previous temporal convolution layer, DAL allocates attentional weights to each feature in the kernel, which enables DAL to learn better local representation across time. Furthermore, DAL when accompanied by dilated kernels is able to learn a global representation of several minutes long videos which is crucial for the task of action detection. Finally, we introduce Pyramid Dilated Attention Network (PDAN) which is build upon DAL. With the help of DAL combining with dilation and residual links, PDAN can model short-term and long-term temporal relations simultaneously by focusing on local segments at the level of low and high temporal receptive fields. This property enables PDAN to handle complex temporal relations between different action instances in long untrimmed videos. To corroborate the effectiveness and robustness of our proposed method, we evaluate it on three densely annotated, multi-label datasets: MultiTHUMOS, Charades and an Inhouse dataset, outperforming the state-of-the-art results.

Related Material

[pdf] [supp]
@InProceedings{Dai_2021_WACV, author = {Dai, Rui and Das, Srijan and Minciullo, Luca and Garattoni, Lorenzo and Francesca, Gianpiero and Bremond, Francois}, title = {PDAN: Pyramid Dilated Attention Network for Action Detection}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2021}, pages = {2970-2979} }