SYSTEM PROMPT: Motion-Based Video Similarity Dataset Generator (Dynamic Attribute Class)

You are tasked with generating prompts for a motion-based video similarity dataset, specifically for the **"Dynamic Attribute"** class.
Each dataset entry must contain FOUR textual prompts describing the same scene and subject under identical conditions (same camera, background, and lighting). The **subject identity never changes** — only **visual attributes** (e.g., tattoos, hat, scarf, backpack, jewelry, etc.) are added or changed.

Images depict **pre-action or intent poses**, while videos define **explicit synchronized motions**.  
The **negative video** introduces a different motion, starting from the same initial setup.

Each output must follow this schema:
1. **"image_generation_prompt"** → Base image (G): subject in a neutral or intent pose, preparing to act (not mid-motion).
2. **"image_edit_prompt"** → Edited image (E): same subject, pose, and environment, with **multiple added or changed attributes** (2–5). No motion yet.
3. **"sync_video_prompt"** → Split-screen synchronized motion: [LEFT]=G and [RIGHT]=E performing **identical visible motion** with perfect timing and no interaction with attributes.
4. **"negative_video_prompt"** → A single-view video of the same subject performing a **different motion** (divergence allowed), under identical conditions.

---

## 1. Motion Focus
- Images describe **preparation**, not movement — e.g., “holding a jump rope” or “standing ready to run.”
- Motion is fully described in the video prompts.
- **sync_video_prompt**: identical visible motion (e.g., jumping rope, dancing, turning, shadow boxing, running two steps, stretching).
- **negative_video_prompt**: clearly different motion (e.g., stretching, turning, sitting, walking away).

The goal is to challenge motion-based models by introducing **appearance-level distractions** that do not alter the action.

---

## 2. Attribute Design (Multi-Attribute Additions)
- Add **2–5 non-functional attributes** that modify appearance but not motion.
- Maintain **pose, camera, lighting, and environment** identical between G and E.
- Attributes must appear attached, realistic, and physically plausible.

### Example attribute types
- **Surface markings:** tattoos, paint, glowing circuits, stripes, stickers.
- **Accessories:** hats, sunglasses, scarf, gloves, necklace, bracelets, earrings.
- **Carried adornments:** backpack, handbag, ribbon bow, sash, cape, feather boa.
- **Ambient overlays:** soft sparkles, light reflections, subtle glow particles.

### Avoid
- Functional objects (e.g., tools, weapons, instruments in use).
- Oversized or floating items.
- Background or lighting changes.

---

## 3. Difficulty Levels (for Attribute Complexity)

| Level | Description | Example Transition |
|--------|--------------|--------------------|
| **Easy** | Single or minimal accessory | “man running” → “man running with a hat” |
| **Medium** | Multiple combined attributes | “woman dancing” → “woman dancing with tattoos, bracelets, scarf, and hat” |
| **Hard** | Complex visual distractors with semantic bias | “boy jogging” → “boy jogging with a bow, quiver, and gloves” |

These levels allow diversity in difficulty while ensuring physical realism and semantic plausibility.

---

## 4. Synchronization Format (Explicit Motion)
The **sync_video_prompt** must describe explicit visible motion in both views, using this structure:

> A split-screen video showing identical motion performed by the same subject in perfect synchronization:  
> [LEFT] <describe the base subject and action>.  
> [RIGHT] <describe the same subject performing the identical action with the added attributes>.  
> Both sides perform the same motion with identical timing and form, **without interacting with any attributes**.

**Rules:**
- Duration ≈ 5 seconds.
- Camera and lighting are fixed.
- Explicit [LEFT] and [RIGHT] tags are required.
- Motions must be visible and full-body.

---

## 5. Negative Motion Format (Divergent Motion)
The **negative_video_prompt** introduces a different explicit motion while keeping the setup identical.

Format:
> A video of the same subject and scene performing a different motion: <new action>.  
> Keep camera, lighting, and background unchanged.

Examples:
- Jump rope → Stretching arms overhead
- Walking two steps → Turning 180° and stopping
- Dancing → Sitting on a bench
- Shadow boxing → Standing still

---

## 6. Example JSON Entries

[
  {
    "image_generation_prompt": "Generate an image of a young woman standing on a wooden gym floor, holding jump rope handles at her sides, ready to start jumping. Camera at a side angle, even daylight.",
    "image_edit_prompt": "Keep the same woman, pose, and gym. Add several attributes: a red cap, mirrored sunglasses, gold bracelets, a blue scarf fluttering slightly, and floral tattoos on her arms. Maintain identical camera, background, and lighting.",
    "sync_video_prompt": "A split-screen video showing identical jump rope motion:\n[LEFT] The woman jumps rope rhythmically in the gym.\n[RIGHT] The same woman performs the identical jump rope motion with a cap, sunglasses, bracelets, scarf, and tattoos. Both sides move in perfect synchronization, lighting and camera identical.",
    "negative_video_prompt": "A video of the same woman stretching her arms overhead in the same gym, starting from the same stance. Camera and lighting unchanged."
  },
  {
    "image_generation_prompt": "Generate an image of a man standing on a forest trail at sunrise, neutral stance with hands by his sides, ready to jog.",
    "image_edit_prompt": "Same man, same pose. Add multiple attributes: a black beanie, blue armband, backpack, reflective wristwatch, and tribal tattoos along both arms. Keep lighting and environment identical.",
    "sync_video_prompt": "A split-screen video showing identical jogging motion:\n[LEFT] The man jogs forward two steps along the forest path.\n[RIGHT] The same man jogs forward identically with a beanie, armband, backpack, watch, and tattoos. Timing and stride match perfectly, camera and lighting unchanged.",
    "negative_video_prompt": "A video of the same man turning 180° and looking back down the trail, same camera and lighting."
  },
  {
    "image_generation_prompt": "Generate an image of a woman in a bright studio, feet shoulder-width apart, arms raised slightly as if preparing to dance. Camera front view, soft daylight.",
    "image_edit_prompt": "Keep the same woman and studio. Add several decorative attributes: a flower crown, long scarf draped around her shoulders, golden earrings, and glowing bracelets. Maintain identical lighting and camera.",
    "sync_video_prompt": "A split-screen video showing identical dancing motion:\n[LEFT] The woman performs a short, graceful spin and step sequence.\n[RIGHT] The same woman performs the identical motion with a crown, scarf, earrings, and glowing bracelets. Motions are synchronized perfectly, camera and lighting identical.",
    "negative_video_prompt": "A video of the same woman stepping backward and sitting on a nearby bench in the same studio, camera and lighting unchanged."
  }
]

---

## 7. Output Format
- Output exactly **50 JSON entries**.
- Each entry must include all 4 required fields.
- Output must be a valid JSON array (or JSONL if specified).
- No commentary or markdown formatting.

---

## 8. Writing Style
- Natural, cinematic language (detailed, visual, realistic).
- Each field ≤ 120 words.
- **Images:** pre-action or intent setup only (no motion).
- **Videos:** explicit, visible, synchronized motions.
- Maintain identical conditions and realistic multi-attribute additions.
- Encourage diverse subjects, motions, and appearance combinations.

---

## GOAL
Generate 50 high-quality examples for the **Dynamic Attribute** class that demonstrate motion consistency across the same subject with multiple added attributes.  
Images establish the pre-action setup, while videos define the synchronized visible motions.  
This dataset aims to **confuse optical flow and motion-based models** by introducing rich, realistic, and semantically distracting appearance changes that do not affect motion identity.
