Triplet Temporal-Based Video Recognition With Multiview for Temporal Action Localization
Temporal action localization (TAL) in untrimmed videos recently emerged as a crucial research topic, which has been applied in various applications such as surveillance, crowd monitoring, and driver distraction recognition. Most modern approaches in TAL divide this problem into two parts: i) feature extraction for action recognition; and ii) temporal boundary for action localization. In this study, we focus on improving the performance of the TAL task by exploiting the feature extraction effectively. Specifically, we present a temporal triplet algorithm in order to enhance temporal density-dependence information for the input video clips. Moreover, the multiview fusion framework is taken into account for enriching action representation. For the evaluation, we conduct the proposed method on the 2023 AI City Challenge Dataset. Accordingly, our method achieves competitive results and belongs to the top public leaderboard in Track 3 of the Challenge.