TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process

Zhiyuan Ren, Minchul Kim, Feng Liu, Xiaoming Liu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9462-9471

Abstract


Recently diffusion models have emerged as a new powerful generative method for 3D point cloud generation tasks. However few works study the effect of the architecture of the diffusion model in the 3D point cloud resorting to the typical UNet model developed for 2D images. Inspired by the wide adoption of Transformers we study the complementary role of convolution (from UNet) and attention (from Transformers). We discover that their respective importance change according to the timestep in the diffusion process. At early stage attention has an outsized influence because Transformers are found to generate the overall shape more quickly and at later stages when adding fine detail convolution starts having a larger impact on the generated point cloud's local surface quality. In light of this observation we propose a time-varying two-stream denoising model combined with convolution layers and transformer blocks. We generate an optimizable mask from each timestep to reweigh global and local features obtaining time-varying fused features. Experimentally we demonstrate that our proposed method quantitatively outperforms other state-of-the-art methods regarding visual quality and diversity. Code is avaiable github.com/Zhiyuan-R/Tiger-Time-varying-Diffusion-Model-for-Point-Cloud-Generation.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Ren_2024_CVPR, author = {Ren, Zhiyuan and Kim, Minchul and Liu, Feng and Liu, Xiaoming}, title = {TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {9462-9471} }