LVS: A Learned Video Storage for Fast and Efficient Video Understanding

Yunghee Lee,Jongse Park; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 8085-8093

Abstract


As video understanding (VU) promises unprecedented capabilities in the era of video data explosion, its efficient computation plays a critical role in practicalizing the algorithmic innovations. While VU models often rely on powerful foundational models such as CLIP to understand visual concepts, the massive computational demand hinders their scalable deployment over real-world video data silos. To this end, this paper proposes LVS, a learned video storage system that memoizes feature vectors for the already-seen video clips and reuses them for future VU queries. The key challenge is the video's continuous nature that disallows the naive computation reuse among VU queries for different video clips. To address this challenge, we identify a unique property in which VU-generated feature vectors form a monoid and leverage the monoid homomorphism using a multilayer perceptron (MLP) model to effectively fuse the disjoint feature vectors. Our evaluation shows that LVS achieves up to 1.59x speedup in VU query processing latency, while experiencing no significant accuracy loss in the UCF101 video classification task.

Related Material


[pdf]
[bibtex]
@InProceedings{Lee_2024_CVPR, author = {Lee, Yunghee and Park, Jongse}, title = {LVS: A Learned Video Storage for Fast and Efficient Video Understanding}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {8085-8093} }