AI-based Video Content Understanding for Automatic and Interactive Multimedia Retrieval

Schoeffmann, Klaus; Leopold, Mario

Klaus Schoeffmann, Mario Leopold; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 3789-3797

Abstract

We present diveXplore, a distributed system for AI-based video content understanding and retrieval, which will be used in the interactive task of the IViSE 2025 workshop. The system combines state-of-the-art deep learning components for shot segmentation, text and speech recognition, and multimodal embeddings with a scalable architecture designed for efficient storage, querying, and user interaction. A key feature of the frontend is an intuitive web-based GUI that supports free-text and semantic search, video summarization, and temporal query composition. We evaluate the performance of a newly developed keyframe scrubbing feature and conduct a qualitative user experiment based on all IViSE 2025 KIS tasks. The results demonstrate the system's effectiveness in interactive video retrieval and inform a set of improvements for future versions.

Related Material

[pdf]

[bibtex]

@InProceedings{Schoeffmann_2025_CVPR, author = {Schoeffmann, Klaus and Leopold, Mario}, title = {AI-based Video Content Understanding for Automatic and Interactive Multimedia Retrieval}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {3789-3797} }