Prompt for the source video: A man is playing tennis
Prompt for each reference image: A photo of Spiderman
Prompt for the synthesized video: A Spiderman is playing tennis