Dream-VL AI: How Discrete Diffusion Models Are Revolutionizing Robot Vision and Planning

2 days ago 高效码农

Dream-VL and Dream-VLA: A Unified Vision–Language and Vision–Language–Action Framework Based on Discrete Diffusion Language Models Snippet (50–80 words) Dream-VL is trained on over 12 million multimodal samples using discrete diffusion, demonstrating strong advantages in long-horizon visual planning and parallel action generation. Dream-VLA is pretrained on 970k robotic manipulation trajectories and achieves 97.2% average performance on LIBERO, 71.4% on SimplerEnv-Bridge, and 60.5% on SimplerEnv-Fractal benchmarks. Table of Contents Introduction Why Discrete Diffusion Language Models (dLLMs)? Dream-VL: Training Data, Capabilities, and Benchmarks Dataset Scale and Training Paradigm High-Level Planning: ViPlan Benchmark Low-Level Action Planning: Speed and Robustness Dream-VLA: Robot Pretraining and Downstream …