Recipe2Video: Synthesizing Personalized Videos From Recipe Texts

Prateksha Udhayanan, Suryateja BV, Parth Laturia, Dev Chauhan, Darshan Khandelwal, Stefano Petrangeli, Balaji Vasan Srinivasan; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 2268-2277

Abstract


Procedural texts are a special type of documents that contain complex textual descriptions for carrying out a sequence of instructions. Due to the lack of visual cues, it often becomes difficult to consume the textual information effectively. In this paper, we focus on recipes - a particular type of procedural document and introduce a novel deep-learning driven system - Recipe2Video that automatically converts a recipe document into a multimodal illustrative video. Our method employs novel retrieval and re-ranking methods to select the best set of images and videos that can provide the desired illustration. We formulate a Viterbi-based optimization algorithm to stitch together a coherent video that combines the visual cues, text and voice-over to present an enhanced mode of consumption. We design automated metrics and compare performance across several baselines on two recipe datasets (RecipeQA, Tasty Videos). Our results on downstream tasks and human studies indicate that Recipe2Video captures the semantic and sequential information of the input in the generated video.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Udhayanan_2023_WACV, author = {Udhayanan, Prateksha and BV, Suryateja and Laturia, Parth and Chauhan, Dev and Khandelwal, Darshan and Petrangeli, Stefano and Srinivasan, Balaji Vasan}, title = {Recipe2Video: Synthesizing Personalized Videos From Recipe Texts}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {2268-2277} }