Triplet Temporal-Based Video Recognition With Multiview for Temporal Action Localization

Huy Duong Le, Minh Quan Vu, Manh Tung Tran, Nguyen Van Phuc; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 5428-5434


Temporal action localization (TAL) in untrimmed videos recently emerged as a crucial research topic, which has been applied in various applications such as surveillance, crowd monitoring, and driver distraction recognition. Most modern approaches in TAL divide this problem into two parts: i) feature extraction for action recognition; and ii) temporal boundary for action localization. In this study, we focus on improving the performance of the TAL task by exploiting the feature extraction effectively. Specifically, we present a temporal triplet algorithm in order to enhance temporal density-dependence information for the input video clips. Moreover, the multiview fusion framework is taken into account for enriching action representation. For the evaluation, we conduct the proposed method on the 2023 AI City Challenge Dataset. Accordingly, our method achieves competitive results and belongs to the top public leaderboard in Track 3 of the Challenge.

Related Material

@InProceedings{Le_2023_CVPR, author = {Le, Huy Duong and Vu, Minh Quan and Tran, Manh Tung and Van Phuc, Nguyen}, title = {Triplet Temporal-Based Video Recognition With Multiview for Temporal Action Localization}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {5428-5434} }