# Supplementary Material

This supplementary video demonstrates key qualitative results discussed in our paper. The video is structured into the following parts, each highlighting a core capability of our framework:

1. **Holistic Generation: Object-Aware Interaction with Co-Speech Gestures**  
This segment showcases our model's ability to enable a human character to interact naturally with target objects of varying heights, while simultaneously producing expressive co-speech gestures.
    - **Example Prompt**: *"A person sits down"*. 
    - As the person sits, observe how the model generates appropriate body pose adjustments based on the object's geometry and produces gestures aligned with the speech's rhythm and semantics. For example, the gesture emphasizes the phrase *"...10 years ago"*, adding expressive nuance to the interaction.

2. **Versatility with Diverse Text Prompts**
This part demonstrates the robustness of our method by showcasing diverse human-object interactions and synchronized co-speech gestures, driven by varying text prompts for the same object. The gestures dynamically adapt to each unique scenario and emotional tone of the speech, while the full-body motion remains physically plausible and coherent with the object.

3. **General-Purpose Co-Speech Gesture Generation**
This section presents results from our method evaluated in a non-interactive setting. The video shows full-body motions and gestures generated from a range of diverse text prompts and speech inputs, highlighting our model's ability to adapt to general, everyday human activities without object interaction.

4. **Diversity and Generalization on BEATX Dataset**  
Here, we demonstrate our model's generalization and expressiveness on different speech inputs. We show three distinct co-speech gesture generations, each corresponding to a different speech segment from the BEATX dataset. In all cases, the model produces expressive and emotionally appropriate gestures, with each result uniquely adapted to the specific speech content and tone. 

These examples collectively demonstrate InteracTalker's key strengths: its ability to unify complex, object-aware interactions with expressive co-speech gestures, and its robust generalization across diverse conditioning signals and scenarios.
