# PaperID-2253 MoRAG - Multi-Fusion Retrieval Augmented Generation for Human Motion (Supplementary) 

This zip file contains:

1. **Supplementary.pdf** : Provides a detailed overview on Dataset, Implementation Details, Prompt strategy, Qualitative analysis and Metrics.

2. **Videos**: Qualitative analysis.
	
    - **MoRAG**: This folder contains results for our multi-part fusion strategy.For each <name>.mp4, the name corresponds to the activity taking place in the video.	
        -  **Diversity**: Diverse motion samples constructed using MoRAG. 

        -  **Generalizability**: Model adapability to the fine-grained changes in the text description. 
				 	     
		-  **Zero-shot**: Comparision on unseen text descriptions with TMR++[1]

    - **MoRAG-Diffuse**: This folder contains generated motions from MoRAG-Diffuse. For each <name>.mp4, the name corresponds to the activity taking place in the video.
        -  **Diversity**: Diverse motion samples generated using MoRAG-Diffuse. 

        -  **Generalizability**: Model adapability to the fine-grained changes in the text description. 
				 	     
		-  **Zero-shot**: Comparision on unseen text descriptions with ReMoDiffuse[2] 

    - **LLM-Importance**: This folder contains the video version of Figure.5: LLM Importance.  For each <name>.mp4, the name corresponds to change in the language of the text description.

        - **Spell-Error.mp4**: Comparision with TMR[3] when text description contains spelling error. 
        
        - **Rephrasing.mp4**: Comparision with TMR[3] when the voice of text description changed from active to passive. 

        - **Substitution.mp4**: Comparision with TMR[3] when  a word is replaced with its synonym in the text description.

    - **Spatial-Composition**: This folder contains results for our spatial compostion procedure. For each <name>.mp4, the name corresponds to the activity taking place in the video. 


    - **Issue-with-left-and-right-Retrieval-Strategy.mp4**:  This folder contains results from asynchronous composition, where the left and right body parts are used as distinct modalities in multi-part retrieval.  For each <name>.mp4, the name corresponds to the activity taking place in the video.

	- **MoRAG-Teaser.mp4**: Video version of Figure.1: Teaser.

    - **Issue-with-ReMoDiffuse-Retrieval-Strategy.mp4**:  Video version of Figure.2.

    - **MoRAG-Overview.mp4**: Video version of Figure.3: Overview.

    - **MoRAG-Training.mp4**: Video version of Figure.4: Training.

    - **Position-Significance.mp4**:  The significance of prompting LLMs for positional information.

3. **code**: Folder contains the python implementation of MoRAG. 
    
    - **utils.py**: Consists util functions which will be used in MoRAG.

    - **prompt.py**: Consists prompt function and openAI call to generate part-specific descriptions for given text.

    - **morag.py**: Constructs part-wise fused motion sequences for a given text using the generated descriptions from `prompt.py`

    - **morag-diffuse.py**: Conditions the diffusion model using the composed motion sequences constructed using `morag.py`. Here we provided only RetrievalDatabase code, not complete motion generation code.

4. References: 

    1. L´eore Bensabath, Mathis Petrovich, and G¨ul Varol. A cross-dataset study for text-based 3d human motion retrieval, 2024.
    2. Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, and Ziwei Liu. Re-modiffuse: Retrieval-augmented motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 364–373, October 2023. 
    3. Mathis Petrovich, Michael J. Black, and G¨ul Varol. TMR: Text-to-motion retrieval using contrastive 3D human motion synthesis. In International Conference on Computer Vision (ICCV), 2023



