A Generic Autoregressive Predictive Feedback Framework for Skeleton-Based Action Recognition

Yin, Xinpeng; Hu, Jing; Cao, Wenming

Xinpeng Yin, Jing Hu, Wenming Cao; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 3465-3479

Abstract

Prior works in skeleton-based action recognition have struggled with overcoming sequence order constraints while achieving comprehensive global modeling of temporal dependencies. However, most focus on enhancing the model's fitting ability across different temporal scales, overlooking the temporal non-stationary characteristics inherent in motion sequences. In this paper, we explore the adaptation of state-space modeling (SSM), typically suited for stationary sequences, to motion sequences. Addressing the challenge posed by the trendiness of motion sequences and the stability requirement of SSM, we integrate SSM into a generalized Autoregressive Predictive Feedback (APF) framework. Our approach involves segmenting motion sequences into trend and stationary components. We introduce the Non-Independent Multi-channel Processing (NiMc-P) module to capture implicit relationships among 3D coordinates and propose the Independent Multi-joint SSM (IMj-S) module to model temporal dependencies within stationary components. Throughout this process, state space matrices drive the feedback mechanism. Experiments conducted on the NTU-RGB+D 60 and NTU-RGB+D 120 datasets demonstrate the efficiency and versatility of APF.

Related Material

[pdf]

[bibtex]

@InProceedings{Yin_2024_ACCV, author = {Yin, Xinpeng and Hu, Jing and Cao, Wenming}, title = {A Generic Autoregressive Predictive Feedback Framework for Skeleton-Based Action Recognition}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {3465-3479} }