DeFormer: Integrating Transformers with Deformable Models for 3D Shape Abstraction from a Single Image

Di Liu, Xiang Yu, Meng Ye, Qilong Zhangli, Zhuowei Li, Zhixing Zhang, Dimitris N. Metaxas; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 14236-14246

Abstract


Explicit 3D shape abstraction from a single 2D image is a long-standing problem in computer vision and graphics. By leveraging a set of primitives to represent the target shape, recent methods have achieved promising results. However, these methods either use a relatively larger number of primitives or lack geometric flexibility due to the low-dimensional expressibility of the primitives. In this paper, we propose a novel bi-channel Transformer architecture, integrated with parameterized deformable models, termed DeFormer, to simultaneously estimate global and local deformations. In this way, DeFormer can abstract complex object shapes while using a small number of primitives which offer a broader geometry coverage and finer details. Then, we introduce a force-driven dynamic fitting and a cycle-consistent re-projection loss to optimize the primitive parameters. Extensive experiments on ShapeNet across various settings show that DeFormer achieves better reconstruction accuracy over the state-of-the-art, and visualizes with consistent semantic correspondences for improved interpretability.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Liu_2023_ICCV, author = {Liu, Di and Yu, Xiang and Ye, Meng and Zhangli, Qilong and Li, Zhuowei and Zhang, Zhixing and Metaxas, Dimitris N.}, title = {DeFormer: Integrating Transformers with Deformable Models for 3D Shape Abstraction from a Single Image}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {14236-14246} }