AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion

Huang, Yangyi; Yuan, Ye; Li, Xueting; Kautz, Jan; Iqbal, Umar

Yangyi Huang, Ye Yuan, Xueting Li, Jan Kautz, Umar Iqbal; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 13533-13543

Abstract

Existing methods for image-to-3D avatar generation struggle to produce highly detailed, animation-ready avatars suitable for real-world applications. We introduce AdaHuman, a novel framework that generates high-fidelity animatable 3D avatars from a single in-the-wild image. AdaHuman incorporates two key innovations: (1) A pose-conditioned 3D joint diffusion model that synthesizes consistent multi-view images in arbitrary poses alongside corresponding 3D Gaussian Splats (3DGS) reconstruction at each diffusion step; (2) A compositional 3DGS refinement module that enhances the details of local body parts through image-to-image refinement and seamlessly integrates them using a novel crop-aware camera ray map, producing a cohesive detailed 3D avatar. These components allow AdaHuman to generate highly realistic standardized A-pose avatars with minimal self-occlusion, enabling rigging and animation with any input motion. Extensive evaluation on public benchmarks and in-the-wild images demonstrates that AdaHuman significantly outperforms state-of-the-art methods in both avatar reconstruction and reposing. Code and models will be publicly available for research purposes.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Huang_2025_ICCV, author = {Huang, Yangyi and Yuan, Ye and Li, Xueting and Kautz, Jan and Iqbal, Umar}, title = {AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {13533-13543} }