LiveSeg: Unsupervised Multimodal Temporal Segmentation of Long Livestream Videos

Jielin Qiu, Franck Dernoncourt, Trung Bui, Zhaowen Wang, Ding Zhao, Hailin Jin; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 5188-5198

Abstract


Livestream videos have become a significant part of online learning, where design, digital marketing, creative painting, and other skills are taught by experienced experts in the sessions, making them valuable materials. However, Livestream tutorial videos are usually hours long, recorded, and uploaded to the Internet directly after the live sessions, making it hard for other people to catch up quickly. An outline will be a beneficial solution, which requires the video to be temporally segmented according to topics. In this work, we introduced a large Livestream video dataset named MultiLive, and formulated the temporal segmentation of the long Livestream videos (TSLLV) task. We propose LiveSeg, an unsupervised Livestream video temporal Segmentation solution, which takes advantage of multimodal features from different domains. Our method achieved a 16.8% F1-score performance improvement compared with the state-of-the-art method.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Qiu_2023_WACV, author = {Qiu, Jielin and Dernoncourt, Franck and Bui, Trung and Wang, Zhaowen and Zhao, Ding and Jin, Hailin}, title = {LiveSeg: Unsupervised Multimodal Temporal Segmentation of Long Livestream Videos}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {5188-5198} }