ID8-Autopilot Workshop: Supplementary

Automingo: Seeing the Unseen - Vision-Language Edge Case Dataset for Detection and Analysis of Autonomous Driving

Animated scene overview
Automingo VQA Dataset: Example of recorded scenario- Leading Braking.

Data acquisition pipeline

Data acquisition and annotation pipeline diagram
Figure 1. Data acquasition and Annotation pipeline consists out for 5 stages: (1) recording planned driving routes while a co-driver pre-labels road scenes in real time; (2) extracting five representative frames per event from annotated timestamps; (3) anonymizing sensitive content by blurring faces and license plates; (4) QA labelling through predefined question answering for each recorded event; and (5) QA rewriting with LLMs to increase lexical diversity in the final question–answer pairs

Definition of recorded scenarios

As mentioned in Section “Dataset” we have oriented our choice of scenarios on Euro NCAP. In Figure automingo examples (extended) can be seen an extended version of a sketch of other scenarios.

Distraction user prompt

You are helping create a
multiple-choice dataset
about autonomous driving scenarios.
Question: {question}
Correct Answer: {correct_answer}
Reasoning for correct answer: {reasoning}

Generate exactly 3 WRONG answers.

Requirements:
- Clearly incorrect but plausible
- Not absurd
- Concise (1 sentence max)
- NOT just Yes/No
- Different misconceptions in each
Respond ONLY with a JSON array
of 3 strings.
LB: Leading Braking
LB: Leading Braking
RA: RoundAbout
RA: RoundAbout
IS: InterSection
IS: InterSection
CS: Construction Site
CS: Construction Site
SLA: Speed Limit Adaptation
SLA: Speed Limit Adaptation
Figure 2. Representative examples of the different driving situations that Automingo is expected to encounter, based on Euro NCAP scenario families. These images illustrate typical real-world traffic contexts used for evaluation.

Collaborative Labelling Application

To coordinate annotation across a distributed team of domain experts while maintaining strict data integrity, we developed a custom browser-based labelling tool. The application was designed to streamline and accelerate the annotation process, allowing annotators to label cases quickly and efficiently. It was built around three main requirements: ensuring exclusive case assignment to avoid duplicate annotations, providing a simple interface with minimal friction for non-technical users, and enabling the direct export of structured outputs fully compatible with the training pipeline.

Architecture

The application is implemented as a self-contained single-page application built distributed as a single HTML file requiring no installation or build step. It communicates with a lightweight Python HTTP server that manages case availability and records completion status. This client-server separation allowed any number of annotators to work concurrently from their own machines by pointing a browser to the shared server URL, with no risk of two annotators receiving the same case.

Setup screen of the labelling application
Figure 3. Setup screen of the labelling application. The annotator enters their name, connects to the shared server, and selects a batch size. The server assigns an exclusive set of cases with no overlap across concurrent users, shown in the read-only assignment list.

Queue-based case assignment

The full dataset was partitioned into nine buckets, each loaded onto the server by the dataset curator. Upon connecting, each annotator selected a batch size (5, 10, 20, or 50 cases, or a custom value) and requested a new batch from the server. The server assigned cases exclusively — once a case was allocated to an annotator, it was locked and unavailable to others — and maintained a persistent record of completion. Annotators could request additional batches at will, export their results at any point, and resume work across sessions. This design enabled fully parallel annotation without coordination overhead, since the server acted as the sole arbiter of case availability.

Annotation interface

Each case was presented as a sequence of five temporally ordered, anonymised frames displayed simultaneously in a responsive grid (Figure app labelling). Individual frames could be enlarged to full screen via click, with keyboard arrow navigation between frames to support careful inspection of temporal dynamics. The annotator's identity was recorded per case to enable traceability of each ground-truth entry.

Situation confirmation step in the labelling application Question answering screen in the labelling application
Figure 4. Top: situation confirmation step, where the annotator verifies that the pre-label matches the visual content before any question is shown; rejecting the label reclassifies the case as empty. Bottom: question answering screen, showing the scenario-specific question, an example reasoning hint, and the mandatory free-text reasoning field.

Situation confirmation and empty handling

Before answering any questions, each annotator was required to confirm whether the displayed sequence matched the situation category assigned during real-time in-vehicle labelling. This step served as an explicit quality gate: if the annotator confirmed the situation, the corresponding scenario-specific question set (three to six questions, see Table situations questions) was presented; if the annotator rejected the pre-label, the case was reclassified as an empty event. In the empty path, the application automatically assigned four questions drawn from a cross-situation question bank via a deterministic round-robin mechanism, designed to ensure uniform coverage of all question types across the empty subset. Context-specific questions whose interpretation depends on a particular scenario (e.g., traffic light colour, numeric speed value) were excluded from the empty bank.

Inline distractor generation

Upon submission of each annotated case, the application called the ChatGPT API to generate three plausible but incorrect alternative answers for each question, using the annotator's correct answer and free-text reasoning as context. Distractors were generated and stored atomically posterior to annotation time, ensuring that the validation fold was produced as a natural by-product of the labelling process rather than as a separate offline step.

Export

On completing a batch, the annotator downloaded a structured Excel file containing one row per question, with columns for scene identifier, situation, question text, correct answer, free-text reasoning, three distractor answers, and annotator name. These per-annotator files were subsequently merged by the dataset curator into the global dataset using a dedicated aggregation pipeline.

NCAP Question and Answer

For each scenario, distinct and specific inquiries were formulated to provide independent data points. These questions were designed to verify, through the different answers, whether the vehicle is correctly positioned within one of the targeted scenarios and to identify its specific operational phase of the time span. These developmental inquiries are detailed in Table situations questions.

Situation Questions
Traffic light Do you see any red traffic lights that could affect the lane the car is in?
Which color has the traffic light that affects our car?
Is our car in a right or left lane from which we can turn after the traffic light?
Do you see in the situation more than 1 traffic light in our direction with different colours?
Leading braking Do we have a car with braking lights on directly in front of us in our lane?
Are we approaching the vehicle in front of us in our lane?
Are we stopped, or is the car in front of us stopped?
Are we stopping in a traffic jam?
Cut in Are there any moving cars in the adjacent lanes to our right or left that are changing into our lane?
Is there any car merging into our lane directly in front of us, with nothing between the new car and us?
Are we in a roundabout?
Are we in a intersection?
Do you see any car on top of the lines that delimit our lane, either on the left or the right?
Construction site Are there any construction signs, such as cones, yellow signs, lights, or closed lanes?
Are the road lines yellow?
Are there any workers working on or around the road?
Crossing object Is there an intersection or a non-continuous lane ahead of our car?
Is there any vehicle or pedestrian in front of our car moving in a different direction than us?
Do we have any vehicle ahead of us in our lane, traveling in the same direction as us?
Lateral parked car Are there cars to the right or left of the lane we are driving in?
Are there any cars parked parallel or perpendicular on the left or right of our car?
Are we in a city where cars can park along the sides, or on a highway?
Are we in a middle lane with more traffic lanes on either side?
Pedestrian Are we on a narrow street where the sidewalks and/or bike lane are right next to the roadway and visible?
Are there any pedestrians or bicycles stopped or traveling alongside our car on the left or right?
Could the pedestrian’s path intersect with our trajectory?
Merging lane Is the vehicle in an acceleration lane on a highway?
Is there an acceleration lane for a highway on the right or left, adjacent to our lane?
Is there any lane whose outer lines end at the lines of our lane?
Intersection road Is there an intersection ahead of us?
Does that intersection maintain the lane lines?
Is there a visible gap or missing lane lines in our lane?
Do the lines of two different lanes converge into the same line?
Roundabout Is the car inside a roundabout?
In our lane, can we see a roundabout ahead that we are approaching?
Is the car stopped while seeing a vehicle crossing almost perpendicularly?
Speed limit adaptation In the sequence, can different speed limit signs with different numbers be seen?
Is there any speed limit sign that affects the lane we are driving in?
What is the speed indicated in the signal?
Are we on highway?
Are we entering or exiting a highway?
Are there speed bumps, crosswalks, or construction zones in our lane?
Table. Situations and associated questions used for scene understanding.