Event-Guided Fusion-Mamba for Context-Aware 3D Human Pose Estimation

Bo Lang, Mooi Choo Chuah; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 950-960

Abstract


3D human pose estimation (3D HPE) is an important computer vision task with various practical applications. Researchers have proposed various deep learning-based methods for 3D HPE. However the majority of such methods rely on lifting 2D pose sequence to 3D which do not perform well in challenging scenarios and are often computationally expensive. Such methods typically rely on 2D joint coordinates which do not provide much spatial context to solve ambiguity problem. In addition merely relying on information extracted from RGB frames may miss temporal information and structural context. Thus in this paper we propose a framework that incorporates event stream as an additional input since event features provide such useful information. Moreover instead of using 2D joint coordinates in pose sequence our framework uses intermediate visual representations produced by off-the-shelf 2D pose detectors to implicitly encode joint-centric spatial context. Our new framework is a novel state space model (SSM)-based solution called Event-Guided Context Aware MambaPose (CA-MambaPose). In CA-MambaPose framework we design a novel cross modality fusion mamba module to skillfully fuse the RGB and Event features. CA-MambaPose has lower computational cost due to the efficiency of Mamba blocks. We conduct extensive experiments to evaluate CA-MambaPose using two existing datasets. Our experimental results show that CA-MambaPose achieves better performance than SOTA methods.

Related Material


[pdf]
[bibtex]
@InProceedings{Lang_2025_WACV, author = {Lang, Bo and Chuah, Mooi Choo}, title = {Event-Guided Fusion-Mamba for Context-Aware 3D Human Pose Estimation}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {950-960} }