Overall Description of video: 
User Interface for Tone-Controlled Road Video Captioning. We demonstrate captioning capability with fine-grained control across five tone  dimensions: Personality, Writing Style, Event Details (Informativeness), Structural Attributes, and Caption Length. Users begin by uploading a road video and selecting the desired tone controls, assigning specific intensity levels. They can further customize structural controls such as 'Emojis' or 'Hashtags' and specify the target word count. The RoadTones-VL-CoT model then produces a caption that adheres to all chosen controls. For clarity, a Road Event Summary is also provided alongside the generated caption. As an additional utility, the output caption can be directly shared on social media platforms such as X or Instagram.

Attachment filename: RoadTones_USER_INTERFACE_Demo.mp4 (16804874 Bytes)

The video can be played on any standard media player for e.g. VLC.

