Intro and Recap Detection for Movies and TV Series

Xiang Hao, Kripa Chettiar, Ben Cheung, Vernon Germano, Raffay Hamid; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 167-176

Abstract


Modern video streaming service companies offer millions of video-titles for its customers. A lot of these titles have repetitive introductory and recap parts in the beginning that customers have to manually skip in order to achieve an uninterrupted viewing experience. To avoid this unnecessary friction, some of the services have recently added "skip-intro" and "skip-recap" buttons to their video players before the intro and recap parts start. To efficiently scale this experience to their entire catalogs, it is important to automate the process of finding the intro and recap portions of titles. In this work, we pose intro and recap detection as a supervised sequence labeling problem and propose a novel end-to-end deep learning framework to this end. Specifically, we use CNNs to extract both visual and audio features from videos, and fuse these features using a B-LSTM in order to capture the various long and short term dependencies among different frame-features over time. Finally, we use a CRF to jointly optimize the sequence labeling for the intro and recap parts of the titles. We present a thorough empirical analysis of our model compared to several other deep learning based architectures and demonstrate the superior performance of our approach.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Hao_2021_WACV, author = {Hao, Xiang and Chettiar, Kripa and Cheung, Ben and Germano, Vernon and Hamid, Raffay}, title = {Intro and Recap Detection for Movies and TV Series}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2021}, pages = {167-176} }