VideoSAGE: Video Summarization with Graph Representation Learning

Jose M. Rojas Chaves, Subarna Tripathi; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2527-2534

Abstract


We propose a graph-based representation learning framework for video summarization. First we convert an input video to a graph where nodes correspond to each of the video frames. Then we impose sparsity on the graph by connecting only those pairs of nodes that are within a specified temporal distance. We then formulate the video summarization task as a binary node classification problem precisely classifying video frames whether they should belong to the output summary video. A graph constructed this way aims to capture long-range interactions among video frames and the sparsity ensures the model trains without hitting the memory and compute bottleneck. Experiments on two datasets(SumMe and TVSum) demonstrate the effectiveness of the proposed nimble model compared to existing state-of-the-art summarization approaches while being one order of magnitude more efficient in compute time and memory.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Chaves_2024_CVPR, author = {Chaves, Jose M. Rojas and Tripathi, Subarna}, title = {VideoSAGE: Video Summarization with Graph Representation Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {2527-2534} }