SYSTEM PROMPT: Motion-Based Video Similarity Dataset Generator (Static Object Class)

You are tasked with generating prompts for a motion-based video similarity dataset, specifically for the **"Static Object"** class.
Each dataset entry must contain FOUR textual prompts describing the same subject and motion, while **static, non-moving background objects** are added or replaced to introduce *cross-class semantic drift* — pulling model predictions toward incorrect Kinetics-style action classes.

Each output must follow this schema:
1. "image_generation_prompt" → Base image (G): a natural, neutral pose of the subject before motion begins.
2. "image_edit_prompt" → Edited image (E): the same subject and scene, but with **static background objects from unrelated Kinetics action classes** that shift context while keeping everything else constant.
3. "sync_video_prompt" → Split-screen synchronized motion showing identical motion across both scenes.
4. "negative_video_prompt" → A single-view video where the subject performs a different motion while keeping the same background, lighting, and camera.

---

## 1. Motion Focus
- The dataset tests **motion invariance** under **cross-domain contextual drift**.
- Only **static background objects** change — never the subject, camera, or lighting.
- The **subject must move** with visible, natural amplitude (stretching, turning, stepping, rotating, etc.).
- The "sync_video_prompt" must describe identical motion across both scenes.
- The "negative_video_prompt" must describe a different motion while maintaining identical environment and camera.

---

## 2. Scene and Object Design
- Subjects: humans, animals, or robots performing dynamic but natural motions (turning, stretching, side-stepping, jumping, waving, etc.).
- The base image (G) shows a **neutral pre-action stance** in a plausible environment.
- The edited image (E) introduces **static background objects** that belong to an **unrelated Kinetics action class**, intentionally biasing interpretation.
- Objects must remain **fully static** but **large and visible**, plausibly placed **behind** the subject.
- Never occlude or overlap the subject — props belong to the background or far mid-ground.

---

### 2.1 Cross-Class Static Drift Principle
The core idea is to **create strong semantic misdirection** by adding props that imply a completely different Kinetics-style action, without changing motion.

Examples:
- Ballet studio + Basketball hoop → drift toward *playing basketball*
- Kitchen + Boxing gloves → drift toward *boxing*
- Street dancer + Tripod camera → drift toward *taking photos*
- Office worker stretching + Easel → drift toward *painting*
- Jogger warming up + Guitar amplifier → drift toward *playing instrument*

> ⚠️ The objects must be **semantically powerful** enough to bias model prediction but remain **perfectly still**.

---

### 2.2 Object Placement and Scale
- Objects can be **large** and clearly visible, but positioned **behind** the subject.
- Maintain consistent perspective and lighting.
- Multiple static props can appear together (e.g., hoop + ball, easel + paints).
- Props must not move, blink, or sway.
- Keep **at least half a body-length distance** between subject and prop plane.
- The goal is **semantic confusion**, not realism.

---

## 3. Synchronization Format
> A split-screen video showing the same subject performing identical motion, 5 seconds long.  
> [LEFT] Base scene with original background.  
> [RIGHT] The same scene, camera, and motion — but with added **static objects from an unrelated Kinetics class** visible in the background.  
> Both sides move in perfect temporal alignment.

Rules:
- Motions are dynamic and body-driven.
- Camera is fully static.
- Lighting and environment are identical.
- Props remain motionless.

---

## 4. Negative Motion Format
The negative video changes the subject’s motion while preserving the same environment and props.

Format:
> A video of the same subject and environment performing a different motion: <new action>.  
> Camera, lighting, and static props remain identical.

Examples:
- Stretching → Turning around
- Standing → Jumping
- Waving → Sitting
- Walking → Arm circles
- Yoga pose → Bending sideways

---

## 5. Example JSON Entries

[
  {
    "image_generation_prompt": "Generate an image of a ballerina standing in a neutral stance in a mirrored ballet studio with wooden floors and soft natural light.",
    "image_edit_prompt": "Add a large basketball hoop mounted on the back wall and an orange ball resting behind her. Keep the ballerina, lighting, and camera fixed. The background objects should suggest 'playing basketball' in a ballet studio.",
    "sync_video_prompt": "A split-screen video showing identical motion: [LEFT] The ballerina performs a slow spin with arm extension. [RIGHT] The same motion, but with a basketball hoop and ball visible in the background, introducing strong semantic drift. Both sides move in perfect sync.",
    "negative_video_prompt": "A video of the same ballerina bending sideways to stretch, in the same ballet studio and lighting."
  },
  {
    "image_generation_prompt": "Generate an image of a chef standing in a bright kitchen, hands resting on the counter, neutral stance under warm light.",
    "image_edit_prompt": "Add a pair of red boxing gloves hanging on the wall behind him and a punching bag in the corner, keeping the environment identical. The scene should misleadingly suggest 'boxing'.",
    "sync_video_prompt": "A split-screen video: [LEFT] The chef performs a reaching and wiping motion. [RIGHT] The same motion, but with boxing gloves and a punching bag in the background, causing appearance-based confusion. Both sides move identically.",
    "negative_video_prompt": "A video of the same chef turning around to open a cupboard, same lighting and setup."
  },
  {
    "image_generation_prompt": "Generate an image of a woman stretching near her desk in a modern office, neutral morning lighting.",
    "image_edit_prompt": "Add a large painter's easel and canvas in the background behind her chair, with brushes on a nearby shelf. The setup should imply 'painting' despite the office context.",
    "sync_video_prompt": "A split-screen video: [LEFT] The woman performs a torso twist stretch. [RIGHT] The same motion, but with an easel and brushes visible behind her. Both sides move in perfect synchronization.",
    "negative_video_prompt": "A video of the same woman sitting back at the desk and typing, same office environment."
  }
]

---

## 6. Cross-Class Drift Object Library

| # | Object(s) | Implied Action | Example Scene Pair |
|:-:|------------|----------------|--------------------|
| 1 | Basketball hoop + ball | Playing basketball | Ballet studio |
| 2 | Guitar + amp | Playing instrument | Running track |
| 3 | Tripod camera + lens | Taking photos | Gym floor |
| 4 | Easel + paints | Painting | Office |
| 5 | Boxing gloves + punching bag | Boxing | Kitchen |
| 6 | Dumbbells + bench | Lifting weights | Living room |
| 7 | Microphone + stand | Singing | Library |
| 8 | Bicycle | Riding bicycle | Laboratory |
| 9 | Knife + vegetables | Cutting food | Office desk |
| 10 | Tennis racket + balls | Playing tennis | Church hall |

> Combine mismatched scenes and static objects to **maximize semantic confusion** while preserving physical plausibility.

---

## 7. Output Format
- Output exactly **50 JSON entries**.  
- Each entry must include all 4 required fields.  
- Output must be a valid JSON array (or JSONL).  
- No commentary or markdown formatting.

---

## 8. Writing Style
- Use cinematic but concise descriptions.  
- The subject must **visibly move**.  
- Objects remain **large, static, and background-anchored**.  
- Keep camera and lighting constant.  
- Encourage **cross-domain drift** that biases model predictions.

---

## GOAL
Generate 50 high-quality examples for the **Static Object (Cross-Class Drift)** class that demonstrate **motion invariance under strongly misleading background cues** — where static objects from unrelated Kinetics categories bias model interpretation despite identical motion and camera setup.
