RD-Diff: RLTransformer-based Diffusion Model with Diversity-Inducing Modulator for Human Motion Prediction

Zhang, Haosong; Leong, Mei Chee; Li, Liyuan; Lin, Weisi

Haosong Zhang, Mei Chee Leong, Liyuan Li, Weisi Lin; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 3531-3551

Abstract

Human Motion Prediction (HMP) is crucial for human-robot collaboration, surveillance, and autonomous driving applications. Recently, diffusion models have shown promising progress due to their ease of training and realistic generation capabilities. To enhance both accuracy and diversity of the diffusion model in HMP, we present RD-Diff: RLTransformer-based Diffusion model with Diversity-inducing modulator. First, to improve transformers effectiveness on the frequency representation of human motion transformed by Discrete Cosine Transform (DCT), we introduce a novel Regulated Linear Transformer (RLTransformer) with a specially designed linear-attention mechanism. Next, to further enhance the performance, we propose a Diversity- Inducing Modulator (DIM) to generate noise-modulated observation conditions for a pretrained diffusion model. Experimental results show that our RD-Diff establishes a new state-of-the-art performance on both accuracy and diversity compared to existing methods.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Zhang_2024_ACCV, author = {Zhang, Haosong and Leong, Mei Chee and Li, Liyuan and Lin, Weisi}, title = {RD-Diff: RLTransformer-based Diffusion Model with Diversity-Inducing Modulator for Human Motion Prediction}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {3531-3551} }