GraspGen Explained: A Friendly Guide to 6-DOF Robot Grasping for Everyone

“
How a new open-source framework lets robots pick up almost anything—without weeks of re-engineering.
1. Why Better Grasping Still Matters
Pick-and-place sounds simple, yet warehouse robots still drop mugs, kitchen assistants miss forks, and lunar rovers struggle with oddly shaped rocks. Three stubborn problems keep coming back:
-
Different grippers → one change of hardware and yesterday’s code is useless. -
Cluttered scenes → toys on a rug, tools in a drawer; the camera never sees the whole object. -
Unknown objects → you can’t label every future item the robot will meet.
GraspGen, released by NVIDIA in July 2025, was built to tackle all three at once. The project ships with:
-
ready-to-use models for three common grippers (Franka Panda two-finger, Robotiq-2F-140, and a 30 mm suction cup); -
a 53-million-grasp dataset covering 8 515 objects; -
a new training trick called On-Generator Training that teaches the robot to ignore its own mistakes.
2. What “6-DOF Grasping” Really Means
Imagine giving a friend directions to pick up a coffee cup:
Those six numbers together are called a 6-DOF pose (DOF = degrees of freedom). GraspGen’s job is to predict many such poses for any object point cloud the robot sees.
3. How GraspGen Works—In Plain English
3.1 Start With a Diffusion Model
You may know diffusion models from image generation apps. Instead of turning noise into a picture, GraspGen turns noise into grasp poses:
-
Training -
We take successful grasps → add noise → teach a neural net to remove that noise.
-
-
Inference -
Feed the network a new object point cloud → start with pure noise → let the network clean it into valid poses.
-
Because a grasp pose has only six numbers (x, y, z, roll, pitch, yaw) the process is fast: 10 denoising steps are enough, compared with 50-100 steps for images.
3.2 Handle Different Grippers Without Rewriting Code
The framework keeps the object encoder (a PointTransformerV3 backbone) fixed and swaps only a small gripper-specific head. That means you can:
-
re-use the same weights for the Franka gripper and the suction cup; -
add a new gripper by training only the lightweight head.
4. The On-Generator Training Trick
Once the diffusion model is trained, it still occasionally invents impossible grasps (floating in mid-air or colliding with the object). Classic work trains a separate discriminator to score these poses, but it uses only offline success/failure labels.
GraspGen does something smarter:
-
Run the diffusion model on 7 000 training objects → create about 14 million fresh candidate grasps. -
Re-simulate every pose in Isaac Sim → obtain new success/failure labels. -
Retrain the discriminator on this model-generated data.
Because the discriminator now sees the exact mistakes the diffusion model makes, it filters them far better at test time. Numbers:
5. The Released Assets—What You Get Today
Total download ≈ 200 GB. One command fetches everything:
git clone https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-GraspGen
6. Quick Start—Run Your First Inference in 5 Minutes
6.1 Install (Docker, Recommended)
git clone https://github.com/NVlabs/GraspGen.git && cd GraspGen
bash docker/build.sh # builds an image with all deps
6.2 Download the trained weights
git clone git@hf.co:adithyamurali/GraspGenModels # ~1 GB
6.3 Visualize on a sample scene
# Terminal 1: start a 3-D viewer
meshcat-server
# Terminal 2: run the demo inside docker
bash docker/run.sh <local_graspgen_path> --models <path_to_models>
cd /code && python scripts/demo_scene_pc.py \
--sample_data_dir /models/sample_data/real_scene_pc \
--gripper_config /models/checkpoints/graspgen_robotiq_2f_140.yml
Open http://localhost:7000
in your browser—you will see green arrows (grasps) on top of a real tabletop scene.
7. Training Your Own Model—A Step-by-Step Recipe
7.1 When Do You Need to Train?
-
Your gripper geometry is not Franka, Robotiq, or the 30 mm suction cup. -
You want to specialize on a narrow domain (e.g., only metal tools).
7.2 What You Need
7.3 Cache the Dataset (One-Time)
bash docker/run.sh <code> --grasp_dataset <ds> --object_dataset <obj> --results <logs>
cd /code && python train_graspgen.py \
task=robotiq_2f_140_gen \
hydra.run.dir=/results/exp_01
The script first builds a compressed HDF5 cache (fast I/O) and then starts training automatically.
7.4 Typical Training Times
8. Benchmarks—Numbers You Can Trust
8.1 Object-Centric Test (Franka, ACRONYM split)
8.2 Cluttered-Scene Test (FetchBench)
9. Real-World Deployment Tips
9.1 Camera Setup
-
One RealSense D435 mounted 0.6 m above the table is enough. -
Calibrate intrinsics and extrinsics once; store in camera_params.yaml
.
9.2 Software Pipeline
RGB-D stream
↓ SAM2 instance segmentation
↓ GraspGen inference (top-100 grasps)
↓ cuRobo motion planning
↓ Robot execution
9.3 Common Failure Modes
10. Frequently Asked Questions
Q1. My gripper looks like Franka but has 5 mm more stroke. Do I retrain?
A: Probably not. Apply a fixed z-offset of −5 mm after inference. Measure 20 test grasps; if success ≥ 85 %, you’re good.
Q2. Can I run this on an edge GPU?
A: Yes. The released TensorRT engine runs at 20 Hz on a Jetson AGX Orin 64 GB (batch size 1, 10 denoising steps).
Q3. How do I add a new object category?
A:
-
Place meshes in object_dataset/new_category/
. -
Run python scripts/generate_grasps.py --category new_category
. -
Append the new labels to grasp_dataset/train.jsonl
. -
Resume training from a checkpoint with train.checkpoint=/path/to/latest.ckpt
.
Q4. The training script dies with “Killed” and no traceback.
A: Increase Docker memory limit (--memory=32g
) or set NUM_REDUNDANT_DATAPOINTS=3
.
Q5. Why do suction grasps have large rotation error?
A: Suction is rotationally symmetric; the metric is ill-defined. Focus on translation error.
11. Citation & License
If you use GraspGen in your research or product, please cite:
@article{murali2025graspgen,
title={GraspGen: A Diffusion-based Framework for 6-DOF Grasping with On-Generator Training},
author={Murali, Adithyavairavan and Sundaralingam, Balakumar and others},
journal={arXiv preprint arXiv:2507.13097},
year={2025}
}
Dataset license: CC-BY 4.0.
Code license: NVIDIA Source Code License (see repo).
12. Where to Go Next
-
Project page: https://graspgen.github.io -
Video walkthrough: https://youtu.be/gM5fgK2aZ1Y -
Issue tracker: https://github.com/NVlabs/GraspGen/issues
Happy grasping!