- [pdf] [supp]
Keypoint-Graph-Driven Learning Framework for Object Pose Estimation
Many recent 6D pose estimation methods exploited object 3D models to generate synthetic images for training because labels come for free. However, due to the domain shift of data distributions between real images and synthetic images, the network trained only on synthetic images fails to capture robust features in real images for 6D pose estimation. We propose to solve this problem by making the network insensitive to different domains, rather than taking the more difficult route of forcing synthetic images to be similar to real images. Inspired by domain adaption methods, a Domain Adaptive Keypoints Detection Network (DAKDN) including a domain adaption layer is used to minimize the discrepancy of deep features between synthetic and real images. A unique challenge here is the lack of ground truth labels (i.e., keypoints) for real images. Fortunately, the geometry relations between keypoints are invariant under real/synthetic domains. Hence, we propose to use the domain-invariant geometry structure among keypoints as a "bridge" constraint to optimize DAKDN for 6D pose estimation across domains. Specifically, DAKDN employs a Graph Convolutional Network (GCN) block to learn the geometry structure from synthetic images and uses the GCN to guide the training for real images. The 6D poses of objects are calculated using Perspective-n-Point (PnP) algorithm based on the predicted keypoints. Experiments show that our method outperforms state-of-the-art approaches without manual poses labels and competes with approaches using manual poses labels.