MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving
Human risky behavior in driving is an important visual recognition problem. In this paper, we propose a multi-view temporal action localization system based on the grayscale video to achieve action recognition in naturalistic driving. Specifically, we adopted SwinTransformer as feature extractor, and a single framework to detect boundary and class at the same time. Also, we improve multiple loss function for explicit constraints of embedded feature distributions. Our proposed framework achieves the overall F1-score of 0.3154 on A2 dataset.