General Point Model Pretraining with Autoencoding and Autoregressive

Zhe Li, Zhangyang Gao, Cheng Tan, Bocheng Ren, Laurence T. Yang, Stan Z. Li; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 20954-20964

Abstract


The pre-training architectures of large language models encompass various types including autoencoding models autoregressive models and encoder-decoder models. We posit that any modality can potentially benefit from a large language model as long as it undergoes vector quantization to become discrete tokens. Inspired by the General Language Model we propose a General Point Model (GPM) that seamlessly integrates autoencoding and autoregressive tasks in a point cloud transformer. This model is versatile allowing fine-tuning for downstream point cloud representation tasks as well as unconditional and conditional generation tasks. GPM enhances masked prediction in autoencoding through various forms of mask padding tasks leading to improved performance in point cloud understanding. Additionally GPM demonstrates highly competitive results in unconditional point cloud generation tasks even exhibiting the potential for conditional generation tasks by modifying the input's conditional information. Compared to models like Point-BERT MaskPoint and PointMAE our GPM achieves superior performance in point cloud understanding tasks. Furthermore the integration of autoregressive and autoencoding within the same transformer underscores its versatility across different downstream tasks.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Li_2024_CVPR, author = {Li, Zhe and Gao, Zhangyang and Tan, Cheng and Ren, Bocheng and Yang, Laurence T. and Li, Stan Z.}, title = {General Point Model Pretraining with Autoencoding and Autoregressive}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {20954-20964} }