AI-based Video Content Understanding for Automatic and Interactive Multimedia Retrieval

Klaus Schoeffmann, Mario Leopold; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, 2025, pp. 3750-3758

Abstract


We present diveXplore, a distributed system for AI-based video content understanding and retrieval, which will be used in the interactive task of the IViSE 2025 workshop. The system combines state-of-the-art deep learning components for shot segmentation, text and speech recognition, and multimodal embeddings with a scalable architecture designed for efficient storage, querying, and user interaction. A key feature of the frontend is an intuitive web-based GUI that supports free-text and semantic search, video summarization, and temporal query composition. We evaluate the performance of a newly developed keyframe scrubbing feature and conduct a qualitative user experiment based on all IViSE 2025 KIS tasks. The results demonstrate the system's effectiveness in interactive video retrieval and inform a set of improvements for future versions.

Related Material


[pdf]
[bibtex]
@InProceedings{Schoeffmann_2025_CVPR, author = {Schoeffmann, Klaus and Leopold, Mario}, title = {AI-based Video Content Understanding for Automatic and Interactive Multimedia Retrieval}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {3750-3758} }