SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons

Yuanyou Xu, Zongxin Yang, Yi Yang; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 314-325

Abstract


Controllable generation has achieved substantial progress in both 2D and 3D domains, yet current conditional generation methods still face limitations in describing detailed shape structures. Skeletons can effectively represent and describe object anatomy and pose. Unfortunately, past studies are often limited to human skeletons. In this work, we generalize skeletal conditioned generation to arbitrary structures. First, we design a reliable mesh skeletonization pipeline to generate a large-scale mesh-skeleton paired dataset.Based on the dataset, a multi-view and 3D generation pipeline is built. We propose to represent 3D skeletons by Coordinate Color Encoding as 2D conditional images. A Skeletal Correlation Module is designed to extract global skeletal features for condition injection. After multi-view images are generated, 3D assets can be obtained by incorporating a large reconstruction model, followed by a UV texture refinement stage. As a result, our method achieves instant generation of multi-view and 3D contents that are aligned with given skeletons. The proposed techniques largely improve the object-skeleton alignment and generation quality.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Xu_2025_CVPR, author = {Xu, Yuanyou and Yang, Zongxin and Yang, Yi}, title = {SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {314-325} }