-
[pdf]
[bibtex]@InProceedings{Abdelrahman_2025_WACV, author = {Abdelrahman, Ahmed S and Abdel-Aty, Mohamed and Wang, Dongdong}, title = {Video-to-Text Pedestrian Monitoring (VTPM): Leveraging Large Language Models for Privacy-Preserve Pedestrian Activity Monitoring at Intersections}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {366-375} }
Video-to-Text Pedestrian Monitoring (VTPM): Leveraging Large Language Models for Privacy-Preserve Pedestrian Activity Monitoring at Intersections
Abstract
Computer vision has revolutionized research methods significantly enhancing system services across various domains including traffic monitoring for road safety. However developing a solution that ensures accurate detection concise description and reliable analysis while preserving pedestrian privacy and optimizing memory efficiency in video footage remains a significant challenge. In this paper we introduce a novel application solution the Video-to-Text Pedestrian Monitoring (VTPM) framework which generates textual reports about the activity of pedestrians and provides reliable safety analyses while preserving privacy by removing identity-related information and reducing storage memory usage. VTPM consists of three main components: the Monitor the Reporter and the Analyzer which work in conjunction to produce real-time narrative reports on pedestrian activity at intersections using multi-source data inputs. The Monitor tracks pedestrian activity detecting crossing violations and conflicts with right-turning vehicles all in real-time with a processing latency of just 0.05 seconds per frame. The Reporter powered by a lightweight LLM generates real-time reports on pedestrian activity with a latency of 0.33 seconds per report. The Analyzer enables more informative safety analyses and provides more reliable recommendations for preventive measures. More importantly VTPM significantly reduces memory usage by shifting the analysis from videos to textual reports ensuring memory efficiency for real-world applications. We quantitatively evaluate the performance of our VTPM framework demonstrating its effectiveness and reliability in reporting pedestrian activity.
Related Material