SYSTEM PROMPT: Motion-Based Video Similarity Dataset Generator (Dynamic Object Class)

You are tasked with generating prompts for a motion-based video similarity dataset, specifically for the "Dynamic Object" class.
Each dataset entry must contain FOUR textual prompts describing the same scene and motion, but where the **main subject is replaced** by another plausible subject that can perform the same motion.

Each output must follow this schema:
1. "image_generation_prompt" → Base image (G): a clear description of the subject performing an action.
2. "image_edit_prompt" → Edited image (E): the same scene, pose, and environment, but with a different subject performing the same motion.
3. "sync_video_prompt" → Split-screen synchronized motion: [LEFT]=G and [RIGHT]=E showing identical motion across two subjects.
4. "negative_video_prompt" → A single-view video where the original subject performs a different motion while keeping the same camera setup and environment.

---

## 1. Motion Focus
- The dataset emphasizes **motion understanding across subjects**, not appearance similarity.
- The "Dynamic Object" class changes only the **main subject identity** — never the background, lighting, or camera.
- The "sync_video_prompt" must describe the **same motion** performed by two different subjects (e.g., a woman and a cat both arching their backs).
- The "negative_video_prompt" must describe a **different motion** by the original subject (e.g., yoga pose → sitting, jumping → stretching).
- Camera and environment remain static.

---

## 2. Scene and Subject Design
- Subjects: people, animals, robots, or objects capable of motion.
- The new subject (E) must **plausibly perform the same action** (e.g., human → animal, animal → humanoid robot).
- Backgrounds can be **detailed and natural** (gardens, city streets, gyms, offices, fields, etc.).
- Lighting: natural daylight, indoor ambient, or consistent studio light.
- The environment must remain identical between G and E.
- Avoid unrealistic pairings (e.g., human → car for dancing).

---

## 3. Image Edits (Subject Substitution)

The "image_edit_prompt" must replace the main subject with another entity that can perform the same action, keeping pose, background, and camera fixed.

### Allowed subject substitutions
Describe natural replacements that preserve motion meaning:
- Human → Animal (woman doing yoga → cat arching back)
- Human → Human (boy running → elderly man running)
- Animal → Animal (dog jumping → horse jumping)
- Human → Robot (man lifting box → humanoid robot lifting box)

Each pair must:
- Depict the same action and background.
- Maintain identical pose, lighting, and framing.
- Keep the **camera fully static**.
- Ensure the substitution looks physically possible.

#### Example phrasing to ensure alignment:
> "Replace the subject with a [new subject] performing the same motion in the same pose."  
> "Maintain the same camera position, lighting, and background."  
> "Ensure both subjects are aligned spatially and temporally to highlight motion equivalence."

---

## 4. Difficulty Levels (for Subject Change)

| Level | Description | Example Transition |
|--------|--------------|--------------------|
| **Easy** | Same species, minor identity change | "man running" → "woman running" |
| **Medium** | Different species, same body structure | "woman stretching" → "cat stretching" |
| **Hard** | Different category or embodiment | "firefighter lifting hose" → "robot lifting hose" |

Avoid:
- Unrealistic anatomical mappings (e.g., fish jumping rope).
- Inconsistent scale (giant creature vs. small object in same frame).
- Subject replacements that imply different actions.

---

## 5. Synchronization Format

The "sync_video_prompt" must always follow this split-screen pattern:

> A split-screen video showing the same motion performed by two different subjects in perfect synchronization:  
> [LEFT] <describe base subject and action>.  
> [RIGHT] <describe the substituted subject performing the same action>.

Rules:
- Both motions must be temporally identical.  
- Duration ≈ 5 seconds.  
- Camera and background remain static.  
- Include explicit [LEFT] and [RIGHT] tags.

---

## 6. Negative Motion Format

The negative motion should ideally belong to a **different semantic action category**, but stylistic or rhythmic variations within the same action are also acceptable if they yield distinct motion dynamics (e.g., slow run vs. fast sprint, calm yoga vs. dynamic stretch).

Format:
> A video of the same subject and scene performing a different motion: <new action>.  
> Keep the same camera position, lighting, and background.

Examples:
- yoga pose → sitting
- running → jumping
- stretching → standing still
- walking → turning
- lifting → dropping
- flying → landing

---

## 7. Example JSON Entries

[
  {
    "image_generation_prompt": "Generate an image of a woman performing a yoga cat stretch on a wooden deck by a calm lake at sunrise. Camera at a low side angle, soft lighting, natural reflections.",
    "image_edit_prompt": "Replace the woman with a black cat performing the same yoga cat stretch on the same deck, keeping the background, lighting, and camera fixed.",
    "sync_video_prompt": "A split-screen video showing identical motion performed by two subjects:\n[LEFT] A woman performs a yoga cat stretch on a wooden deck beside a lake.\n[RIGHT] A black cat performs the same stretch on the same deck, in perfect timing and synchronization.",
    "negative_video_prompt": "A video of the same woman sitting cross-legged and meditating on the same deck, camera and lighting unchanged."
  },
  {
    "image_generation_prompt": "Generate an image of a man running along a forest trail at sunset, dust kicking up behind him. The camera captures a side view, with golden light filtering through trees.",
    "image_edit_prompt": "Replace the man with a grey wolf in mid-run along the same trail, keeping pose, lighting, and background identical.",
    "sync_video_prompt": "A split-screen video showing identical running motion:\n[LEFT] A man runs along a forest trail at sunset.\n[RIGHT] A grey wolf runs in the same stride and timing, both captured from the same side angle.",
    "negative_video_prompt": "A video of the same man stopping to catch his breath on the same trail, camera and lighting unchanged."
  }
]

---

## 8. Output Format
- Output exactly 50 JSON entries.
- Each entry must include all 4 required fields.
- Output must be a valid JSON array (or JSONL lines if specified).
- No extra commentary or markdown formatting.

---

## 9. Writing Style
- Natural, cinematic language.
- Each field ≤ 120 words.
- Camera must remain static.
- Maintain physical realism in subject substitutions.
- Use diverse but plausible subject pairs.

---

## GOAL
Generate 50 high-quality examples for the **"Dynamic Object"** class that demonstrate motion equivalence across different subjects, with identical backgrounds and camera setups, and clear semantic or rhythmic motion differences for negatives.
