Improving Viewpoint Consistency in 3D Generation via Structure Feature and CLIP Guidance

Zhang, Qing; Tong, Jinguang; Zhang, Jing; Hong, Jie; Li, Xuesong

Qing Zhang, Jinguang Tong, Jing Zhang, Jie Hong, Xuesong Li; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 6499-6508

Abstract

Despite recent advances in text-to-3D generation techniques, current methods often suffer from geometric inconsistencies, commonly called the Janus Problem. This work discusses the potential cause of the Janus problem: the bias in viewpoints (i.e. long-tailed distribution in viewpoints) of images generated by the diffusion model. It is found that the diffusion model tends to generate images from the front viewpoints. To address the issue, we propose a training-free approach called the Structure Feature and CLIP Guidance (SFCG) mechanism, which provides various guidance during the generation process. First, SFCG enhances the consistency between the generated image and the desired viewpoints by exploiting the self-guidance strategy combined with rich structural representation in the diffusion model. Then, it employs CLIP-based view-text similarities to filter out the generated images that do not match their desired viewpoints for balancing the incorrect optimization of 3D representations. Extensive experiments demonstrate that our method effectively addresses the Janus Problem without compromising generation speed.

Related Material

[pdf] [arXiv]

[bibtex]

@InProceedings{Zhang_2025_ICCV, author = {Zhang, Qing and Tong, Jinguang and Zhang, Jing and Hong, Jie and Li, Xuesong}, title = {Improving Viewpoint Consistency in 3D Generation via Structure Feature and CLIP Guidance}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {6499-6508} }