A Simple and Efficient Method for Dubbed Audio Sync Detection Using Compressive Sensing

Avijit Vajpayee, Zhikang Zhang, Abhinav Jain, Vimal Bhat; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2023, pp. 565-572

Abstract


Lack of temporal synchronization between audio and video streams represents one of the major quality defects in videos. The defect is more prominent in dubbed media due to errors in post-production such as improper audio overlay. Prior works in Audio-Video sync detection rely on either lip synchronization methods, which cannot be applied to dubbed media, or on self-supervised embeddings for general sound events, which are not accurate. In this paper, we present a novel, accurate and efficient method for temporal sync detection between dubbed audio tracks and corresponding non-dubbed original-language audio tracks. Using the availability of non-dubbed audio tracks and existing lip sync methods, we can simplify the problem of "Dubbed Audio-to-Video" sync detection to that of "Dubbed Audio-to-Original Audio" sync detection. Our method finds and compares matching frames in compressed audio signatures, achieving near perfect classification with 99.4 F1 score in less than 1 minute of processing time per hour of audio, along with 99.6% relative reduction in memory footprint compared to an uncompressed full audio spectrogram. We believe this is the first work to tackle temporal sync detection in dubbed media.

Related Material


[pdf]
[bibtex]
@InProceedings{Vajpayee_2023_WACV, author = {Vajpayee, Avijit and Zhang, Zhikang and Jain, Abhinav and Bhat, Vimal}, title = {A Simple and Efficient Method for Dubbed Audio Sync Detection Using Compressive Sensing}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {January}, year = {2023}, pages = {565-572} }