MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving

Wei Li, Shimin Chen, Jianyang Gu, Ning Wang, Chen Chen, Yandong Guo; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 3242-3248

Abstract


Human risky behavior in driving is an important visual recognition problem. In this paper, we propose a multi-view temporal action localization system based on the grayscale video to achieve action recognition in naturalistic driving. Specifically, we adopted SwinTransformer as feature extractor, and a single framework to detect boundary and class at the same time. Also, we improve multiple loss function for explicit constraints of embedded feature distributions. Our proposed framework achieves the overall F1-score of 0.3154 on A2 dataset.

Related Material


[pdf]
[bibtex]
@InProceedings{Li_2022_CVPR, author = {Li, Wei and Chen, Shimin and Gu, Jianyang and Wang, Ning and Chen, Chen and Guo, Yandong}, title = {MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {3242-3248} }