- [pdf] [supp]
Progressive Hypothesis Transformer for 3D Human Mesh Recovery
Recent advancements in Transformer-based human mesh reconstruction (HMR) are commendable. However, these models often lift 2D images directly to 3D vertices without explicit intermediate guidance. In addition, the global attention mechanism tends to spread attention across larger body areas and even unrelated background regions during human mesh estimation, rather than focusing on critical local regions such as human body joints. This tendency leads to inaccurate and unrealistic results for complex activities. To address these challenges, we introduce the Progressive Hypotheses Transformer, which employs 2D and 3D pose predictions to progressively guide our model. Moreover, we propose a mechanism that generates multiple plausible hypotheses for both 2D and 3D poses to mitigate potential inaccuracies arising from intermediate pose estimations. Our model also incorporates inter-intra attention to capture correlations between joints and hypotheses. Experimental results demonstrate that our method surpasses existing imagebased approaches on Human3.6M  and 3DPW  with fewer parameters and relatively lower computational costs.