PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation

Ryozo Masukawa, Sanggeon Yun, Yoshiki Yamaguchi, Mohsen Imani; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 6415-6424

Abstract


Video crime detection is a significant application of computer vision and artificial intelligence. However existing datasets primarily focus on detecting severe crimes by analyzing entire video clips often neglecting the precursor activities (i.e. privacy violations) that could potentially prevent these crimes. To address this limitation we present PV-VTT (Privacy Violation Video To Text) a unique multimodal dataset aimed at identifying privacy violations. PV-VTT provides detailed annotations for both video and text in scenarios. To ensure the privacy of individuals in the videos we only provide video feature vectors avoiding the release of any raw video data. This privacy-focused approach allows researchers to use the dataset while protecting participant confidentiality. Recognizing that privacy violations are often ambiguous and context-dependent we propose a Graph Neural Network (GNN)-based video description model. Our model generates a GNN-based prompt with an image for a Large Language Model (LLM) which delivers cost-effective and high-quality video descriptions. By leveraging a single video frame along with relevant text our method reduces the number of input tokens required maintaining descriptive quality while optimizing LLM API usage. Extensive experiments validate the effectiveness and interpretability of our approach in video description tasks and the flexibility of our PV-VTT dataset.

Related Material


[pdf]
[bibtex]
@InProceedings{Masukawa_2025_WACV, author = {Masukawa, Ryozo and Yun, Sanggeon and Yamaguchi, Yoshiki and Imani, Mohsen}, title = {PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {6415-6424} }