6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation

Xu, Li; Qu, Haoxuan; Cai, Yujun; Liu, Jun

Li Xu, Haoxuan Qu, Yujun Cai, Jun Liu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9676-9686

Abstract

Estimating the 6D object pose from a single RGB image often involves noise and indeterminacy due to challenges such as occlusions and cluttered backgrounds. Meanwhile diffusion models have shown appealing performance in generating high-quality images from random noise with high indeterminacy through step-by-step denoising. Inspired by their denoising capability we propose a novel diffusion-based framework (6D-Diff) to handle the noise and indeterminacy in object pose estimation for better performance. In our framework to establish accurate 2D-3D correspondence we formulate 2D keypoints detection as a reverse diffusion (denoising) process. To facilitate such a denoising process we design a Mixture-of-Cauchy-based forward diffusion process and condition the reverse process on the object appearance features. Extensive experiments on the LM-O and YCB-V datasets demonstrate the effectiveness of our framework.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Xu_2024_CVPR, author = {Xu, Li and Qu, Haoxuan and Cai, Yujun and Liu, Jun}, title = {6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {9676-9686} }