Multi View Action Recognition for Distracted Driver Behavior Localization
This paper presents our approach for Track 3 (Naturalistic Driving Action Recognition) of the 2023 AI City Challenge, where the objective is to classify distracting driving activities in each untrimmed naturalistic driving video and localize the accurate temporal boundaries of them. Our solution relies on large model fine-tuning to train a base video recognition model on a small-scale video dataset. After that, we adopt multi-view multi-fold ensemble to produce fine-grained clip-level classification results. Given the recognition probabilities, a non-trivial clustering and removing post-processing algorithm is applied to generate final location proposals. Extensive experiments demonstrate that the proposed method achieves superior performance against other methods and rank the 1st on the Test-A2 of the challenge track.