Towards Accurate 3D Human Motion Prediction From Incomplete Observations
Predicting accurate and realistic future human poses from historically observed sequences is a fundamental task in the intersection of computer vision, graphics, and artificial intelligence. Recently, continuous efforts have been devoted to addressing this issue, which has achieved remarkable progress. However, the existing work is seriously limited by complete observation, that is, once the historical motion sequence is incomplete (with missing values), it can only produce unexpected predictions or even deformities. Furthermore, due to inevitable reasons such as occlusion and the lack of equipment precision, the incompleteness of motion data occurs frequently, which hinders the practical application of current algorithms. In this work, we first notice this challenging problem, i.e., how to generate high-fidelity human motion predictions from incomplete observations. To solve it, we propose a novel multi-task graph convolutional network (MT-GCN). Specifically, the model involves two branches, in which the primary task is to focus on forecasting future 3D human actions accurately, while the auxiliary one is to repair the missing value of the incomplete observation. Both of them are integrated into a unified framework to share the spatio-temporal representation, which improves the final performance of each collaboratively. On three large-scale datasets, for various data missing scenarios in the real world, extensive experiments demonstrate that our approach is consistently superior to the state-of-the-art methods in which the missing values from incomplete observations are not explicitly analyzed.