Prompt for the source video: A bear is walking
Prompt for each reference image: A photo of a black bear
Prompt for the synthesized video: A black bear is walking