Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large Vision Language Model

Xuan, Khai Trinh; Nguyen, Khoi Nguyen; Ngo, Bach Hoang; Xuan, Vu Dinh; An, Minh-Hung; Dinh, Quang-Vinh

Khai Trinh Xuan, Khoi Nguyen Nguyen, Bach Hoang Ngo, Vu Dinh Xuan, Minh-Hung An, Quang-Vinh Dinh; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7046-7055

Abstract

The increasing complexity of traffic dynamics has underscored the necessity for advanced traffic safety description and analysis challenging the efficacy of current methodologies in comprehensively understanding and predicting safety conditions from transportation videos. This paper addresses these challenges by analyzing specific segments crucial for precise traffic safety descriptions. Through this examination we introduce an innovative preprocessing method named "segment extraction" facilitating the creation of a novel segment-based training dataset. Additionally we present a practical two-stage training framework specifically tailored for this dataset. This framework is designed to produce accurate descriptions of traffic safety by incorporating the unique attributes of our segment-based training datasets. Leveraging these contributions our method achieved a notable 2nd rank with a score of 32.8877 in the AI City Challenge Track2 test set: Traffic Safety Description and Analysis 2024. The source code for the proposed approaches is openly accessible at https://github.com/AIVIETNAMResearch/AI-CIty-2024-Track2

Related Material

[pdf]

[bibtex]

@InProceedings{Xuan_2024_CVPR, author = {Xuan, Khai Trinh and Nguyen, Khoi Nguyen and Ngo, Bach Hoang and Xuan, Vu Dinh and An, Minh-Hung and Dinh, Quang-Vinh}, title = {Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large Vision Language Model}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7046-7055} }