An Effective Temporal Localization Method With Multi-View 3D Action Recognition for Untrimmed Naturalistic Driving Videos

Manh Tung Tran, Minh Quan Vu, Ngoc Duong Hoang, Khac-Hoai Nam Bui; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 3168-3173

Abstract


Naturalistic driving studies with computer vision techniques have become an emergent research issue. The objective is to classify the distracted behavior actions by drivers. Specifically, this issue is regarded as temporal action localization (TAL) of untrimmed videos, which is a challenging task in the research field of video analysis. Particularly, TAL remains as one of the most challenging unsolved problems in computer vision that requires not only the recognition of action but the localization of the start and end times of each action. Most state-of-the-art approaches adopt complex architectures, which are expensive training and inefficient inference time. In this study, we propose a new framework for untrimmed naturalistic driving videos by utilizing the results from 3D action recognition with video clip classification for short temporal and spatial correlation. Then, simple post-processing based on data-driven is presented for long temporal correlation in untrimmed videos. The proposed method is evaluated on the AI City Challenge 2022 dataset for Naturalistic Driving Action Recognition. Accordingly, our method achieves the top 1 on the public leaderboard of the challenge.

Related Material


[pdf]
[bibtex]
@InProceedings{Tran_2022_CVPR, author = {Tran, Manh Tung and Vu, Minh Quan and Hoang, Ngoc Duong and Bui, Khac-Hoai Nam}, title = {An Effective Temporal Localization Method With Multi-View 3D Action Recognition for Untrimmed Naturalistic Driving Videos}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {3168-3173} }