3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction

Leslie Ching Ow Tiong, Dick Sigmund, Andrew Beng Jin Teoh; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 1438-1454


Recently, the transformer model has been successfully employed for the multi-view 3D reconstruction problem. However, challenges remain in designing an attention mechanism to explore the multi-view features and exploit their relations for reinforcing the encoding-decoding modules. This paper proposes a new model, namely 3D coarse-to-fine transformer (3D-C2FT), by introducing a novel coarse-to-fine (C2F) attention mechanism for encoding multi-view features and rectifying defective voxel-based 3D objects. C2F attention mechanism enables the model to learn multi-view information flow and synthesize 3D surface correction in a coarse to fine-grained manner. The proposed model is evaluated by ShapeNet and Multi-view Real-life voxel-based datasets. Experimental results show that 3D-C2FT achieves notable results and outperforms several competing models on these datasets.

Related Material

[pdf] [supp] [code]
@InProceedings{Tiong_2022_ACCV, author = {Tiong, Leslie Ching Ow and Sigmund, Dick and Teoh, Andrew Beng Jin}, title = {3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {1438-1454} }