GraspGen-X: Cross-Embodiment 6-DOF Diffusion-based Grasping

Han, Beining; Chao, Yu-Wei; Coumans, Erwin; Eppner, Clemens; Deng, Jia; Birchfield, Stan; Murali, Adithyavairavan

Beining Han, Yu-Wei Chao, Erwin Coumans, Clemens Eppner, Jia Deng, Stan Birchfield, Adithyavairavan Murali; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 20878-20889

Abstract

We study cross-embodiment 6-DOF robot grasping. Unlike prior works, we require the model not only to generalize to novel objects / scenes but also to novel gripper morphologies and physical grasping processes. Our method extends diffusion model based generative 6-DOF grasping models to condition on the additional gripper's representation. We propose a swept-volume heuristic for encoding the gripper. We train our cross-embodiment model with procedural grippers and a large-scale dataset of 395 Million grasps. In simulation experiments, our model has the best zero-shot generalization to novel real-world grippers and objects over baseline methods. Our model also serves as a good initialization for fine-tuning to adapt to novel grippers. In ablations, we demonstrate the efficiency of our sweep-volume gripper representation and our procedural gripper training dataset. Last, we show zero-shot generalization to real-world novel grippers for 6-DOF grasping, surpassing baselines in cross-embodiment generalization.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Han_2026_CVPR, author = {Han, Beining and Chao, Yu-Wei and Coumans, Erwin and Eppner, Clemens and Deng, Jia and Birchfield, Stan and Murali, Adithyavairavan}, title = {GraspGen-X: Cross-Embodiment 6-DOF Diffusion-based Grasping}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {20878-20889} }