NarrAD: Automatic Generation of Audio Descriptions for Movies with Rich Narrative Context

Jaehyeong Park, Junchel Ye, Seungkook Lee, Hyun W. Ka, Dongsu Han; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 409-419

Abstract


Audio Description (AD) is a narration designed to enhance accessibility for visually impaired individuals by conveying the key visual elements of a video. Thus automating AD generation for long-form videos such as movies and dramas provides high social value but is a challenging task. First AD must reflect the narrative context of the entire movie including the storyline names of characters and places and the cultural setting. Second to avoid disrupting the immersive experience of the movie AD must not overlap with the characters' dialogues requiring the delivery of numerous visual elements in concise sentences. This paper presents NarrAD a training-free AD generation framework that satisfies both of the requirements by leveraging rich narrative context in movie scripts and curating information across narration slots. Experiments on the MAD dataset demonstrate that our approach outperforms prior works in both captioning and LLM-based metrics. In the user study with 600 subjects NarrAD achieves the highest user experience and movie comprehension. NarrAD's AD samples are available at https://bit.ly/4aSwOTr.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Park_2025_WACV, author = {Park, Jaehyeong and Ye, Junchel and Lee, Seungkook and Ka, Hyun W. and Han, Dongsu}, title = {NarrAD: Automatic Generation of Audio Descriptions for Movies with Rich Narrative Context}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {409-419} }