Tails Tell Tales: Chapter-wide Manga Transcriptions with Character Names

Ragav Sachdeva, Gyungin Shin, Andrew Zisserman; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 2053-2069

Abstract


Enabling engagement of manga by visually impaired individuals presents a significant challenge due to its inherently visual nature. With the goal of fostering accessibility, this paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically, with a particular emphasis on ensuring narrative consistency. This entails identifying (i) what is being said, i.e., detecting the texts on each page and classifying them into essential vs non-essential, and (ii) who is saying it, i.e., attributing each dialogue to its speaker, while ensuring the same characters are named consistently throughout the chapter. To this end, we introduce: (i) Magiv2, a model that is capable of generating high-quality chapter-wide manga transcripts with named characters and significantly higher precision in speaker diarisation over prior works; (ii) an extension of the PopManga evaluation dataset, which now includes annotations for speech-bubble tail boxes, associations of text to corresponding tails, classifications of text as essential or non-essential, and the identity for each character box; and (iii) a new character bank dataset, which comprises over 2.2K principal characters from 64 manga series, featuring an average of 6.8 exemplar images per character, as well as a list of chapters in which they appear. The code, trained model, and both datasets will be made publicly available.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Sachdeva_2024_ACCV, author = {Sachdeva, Ragav and Shin, Gyungin and Zisserman, Andrew}, title = {Tails Tell Tales: Chapter-wide Manga Transcriptions with Character Names}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {2053-2069} }