GraspGen Explained: A Friendly Guide to 6-DOF Robot Grasping for Everyone

A Diffusion-based Framework for 6-DOF Grasping

How a new open-source framework lets robots pick up almost anything—without weeks of re-engineering.


1. Why Better Grasping Still Matters

Pick-and-place sounds simple, yet warehouse robots still drop mugs, kitchen assistants miss forks, and lunar rovers struggle with oddly shaped rocks. Three stubborn problems keep coming back:

  • Different grippers → one change of hardware and yesterday’s code is useless.
  • Cluttered scenes → toys on a rug, tools in a drawer; the camera never sees the whole object.
  • Unknown objects → you can’t label every future item the robot will meet.

GraspGen, released by NVIDIA in July 2025, was built to tackle all three at once. The project ships with:

  • ready-to-use models for three common grippers (Franka Panda two-finger, Robotiq-2F-140, and a 30 mm suction cup);
  • a 53-million-grasp dataset covering 8 515 objects;
  • a new training trick called On-Generator Training that teaches the robot to ignore its own mistakes.

2. What “6-DOF Grasping” Really Means

Imagine giving a friend directions to pick up a coffee cup:

Direction you give In robot math Symbol
Move forward 10 cm Translation along X +x
Slide right 5 cm Translation along Y +y
Lift 8 cm Translation along Z +z
Twist palm down Rotation around X roll
Turn hand left Rotation around Y pitch
Rotate wrist Rotation around Z yaw

Those six numbers together are called a 6-DOF pose (DOF = degrees of freedom). GraspGen’s job is to predict many such poses for any object point cloud the robot sees.


3. How GraspGen Works—In Plain English

3.1 Start With a Diffusion Model

You may know diffusion models from image generation apps. Instead of turning noise into a picture, GraspGen turns noise into grasp poses:

  1. Training

    • We take successful grasps → add noise → teach a neural net to remove that noise.
  2. Inference

    • Feed the network a new object point cloud → start with pure noise → let the network clean it into valid poses.

Because a grasp pose has only six numbers (x, y, z, roll, pitch, yaw) the process is fast: 10 denoising steps are enough, compared with 50-100 steps for images.

3.2 Handle Different Grippers Without Rewriting Code

The framework keeps the object encoder (a PointTransformerV3 backbone) fixed and swaps only a small gripper-specific head. That means you can:

  • re-use the same weights for the Franka gripper and the suction cup;
  • add a new gripper by training only the lightweight head.

4. The On-Generator Training Trick

Once the diffusion model is trained, it still occasionally invents impossible grasps (floating in mid-air or colliding with the object). Classic work trains a separate discriminator to score these poses, but it uses only offline success/failure labels.

GraspGen does something smarter:

  1. Run the diffusion model on 7 000 training objects → create about 14 million fresh candidate grasps.
  2. Re-simulate every pose in Isaac Sim → obtain new success/failure labels.
  3. Retrain the discriminator on this model-generated data.

Because the discriminator now sees the exact mistakes the diffusion model makes, it filters them far better at test time. Numbers:

Training Data Discriminator AUC Memory Use
Offline labels only 0.886 100 %
On-Generator labels 0.947 4.7 % (21× smaller)

5. The Released Assets—What You Get Today

Asset Size Purpose
GraspGen dataset (Franka) 17 M grasps Train or fine-tune
GraspGen dataset (Robotiq-2F-140) 17 M grasps Train or fine-tune
GraspGen dataset (suction) 17 M grasps Train or fine-tune
Pre-trained checkpoints 3× models Zero-shot inference
Docker image 3 GB Reproduce all results
Python demo scripts 10 files Real-camera examples

Total download ≈ 200 GB. One command fetches everything:

git clone https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-GraspGen

6. Quick Start—Run Your First Inference in 5 Minutes

6.1 Install (Docker, Recommended)

git clone https://github.com/NVlabs/GraspGen.git && cd GraspGen
bash docker/build.sh   # builds an image with all deps

6.2 Download the trained weights

git clone git@hf.co:adithyamurali/GraspGenModels  # ~1 GB

6.3 Visualize on a sample scene

# Terminal 1: start a 3-D viewer
meshcat-server

# Terminal 2: run the demo inside docker
bash docker/run.sh <local_graspgen_path> --models <path_to_models>
cd /code && python scripts/demo_scene_pc.py \
  --sample_data_dir /models/sample_data/real_scene_pc \
  --gripper_config /models/checkpoints/graspgen_robotiq_2f_140.yml

Open http://localhost:7000 in your browser—you will see green arrows (grasps) on top of a real tabletop scene.


7. Training Your Own Model—A Step-by-Step Recipe

7.1 When Do You Need to Train?

  • Your gripper geometry is not Franka, Robotiq, or the 30 mm suction cup.
  • You want to specialize on a narrow domain (e.g., only metal tools).

7.2 What You Need

File Description Example
gripper.urdf kinematic + collision model provided in assets/
gripper.yml GraspGen config same folder
object_dataset/ watertight .obj meshes download via helper script
grasp_dataset/ JSON lines with 6-DOF pose + success label generate via Isaac Sim

7.3 Cache the Dataset (One-Time)

bash docker/run.sh <code> --grasp_dataset <ds> --object_dataset <obj> --results <logs>
cd /code && python train_graspgen.py \
  task=robotiq_2f_140_gen \
  hydra.run.dir=/results/exp_01

The script first builds a compressed HDF5 cache (fast I/O) and then starts training automatically.

7.4 Typical Training Times

Hardware Epochs Wall-Clock
8×A100 80 GB 3 000 40 h (generator) + 90 h (discriminator)
1×RTX 4090 24 GB 3 000 8 days (batch size 16)

8. Benchmarks—Numbers You Can Trust

8.1 Object-Centric Test (Franka, ACRONYM split)

Model AUC ↑ Coverage ↑
SE3-Diff baseline 0.200 25 %
DexDiffuser 0.344 48 %
M2T2 0.636 67 %
GraspGen (ours) 0.947 85 %

8.2 Cluttered-Scene Test (FetchBench)

Model Task Success Grasp Success
M2T2 52.6 % 60 %
AnyGrasp 63.7 % 70 %
GraspGen 81.3 % 90.5 %

9. Real-World Deployment Tips

9.1 Camera Setup

  • One RealSense D435 mounted 0.6 m above the table is enough.
  • Calibrate intrinsics and extrinsics once; store in camera_params.yaml.

9.2 Software Pipeline

RGB-D stream
     ↓ SAM2 instance segmentation
     ↓ GraspGen inference (top-100 grasps)
     ↓ cuRobo motion planning
     ↓ Robot execution

9.3 Common Failure Modes

Symptom Quick Fix
Grasps hover 2 cm above object Add z-offset = −0.02 m in post-processing
Small objects ignored Increase point-cloud density (move camera closer by 10 cm)
Shelf scenes fail Lower collision-check safety margin in cuRobo

10. Frequently Asked Questions

Q1. My gripper looks like Franka but has 5 mm more stroke. Do I retrain?
A: Probably not. Apply a fixed z-offset of −5 mm after inference. Measure 20 test grasps; if success ≥ 85 %, you’re good.

Q2. Can I run this on an edge GPU?
A: Yes. The released TensorRT engine runs at 20 Hz on a Jetson AGX Orin 64 GB (batch size 1, 10 denoising steps).

Q3. How do I add a new object category?
A:

  1. Place meshes in object_dataset/new_category/.
  2. Run python scripts/generate_grasps.py --category new_category.
  3. Append the new labels to grasp_dataset/train.jsonl.
  4. Resume training from a checkpoint with train.checkpoint=/path/to/latest.ckpt.

Q4. The training script dies with “Killed” and no traceback.
A: Increase Docker memory limit (--memory=32g) or set NUM_REDUNDANT_DATAPOINTS=3.

Q5. Why do suction grasps have large rotation error?
A: Suction is rotationally symmetric; the metric is ill-defined. Focus on translation error.


11. Citation & License

If you use GraspGen in your research or product, please cite:

@article{murali2025graspgen,
  title={GraspGen: A Diffusion-based Framework for 6-DOF Grasping with On-Generator Training},
  author={Murali, Adithyavairavan and Sundaralingam, Balakumar and others},
  journal={arXiv preprint arXiv:2507.13097},
  year={2025}
}

Dataset license: CC-BY 4.0.
Code license: NVIDIA Source Code License (see repo).


12. Where to Go Next

  • Project page: https://graspgen.github.io
  • Video walkthrough: https://youtu.be/gM5fgK2aZ1Y
  • Issue tracker: https://github.com/NVlabs/GraspGen/issues

Happy grasping!