Prompt for the source video: A dog sitting on the floor and turning its head to the side in front of a cabinet
Prompt for each reference image: A photo of <cat>
Prompt for the synthesized video: A <cat> sitting on the floor and turning its head to the side in front of a cabinet
