Causal-Attention Diffusion LM: How WeDLM Outperforms vLLM Without Custom Kernels

5 hours ago 高效码农

WeDLM in Practice: How to Deploy a Causal-Attention Diffusion LM That Outruns vLLM Without New Kernels TL;DR: WeDLM keeps causal attention, reorders tokens so masked positions still see all observed context, and commits tokens left-to-right as soon as they are predicted. The result is the first diffusion-style language model that beats a production vLLM baseline in wall-clock time while preserving (and sometimes improving) accuracy. This post explains why it works, how to run it, and what to watch when you ship it. What exact problem does WeDLM solve? Question answered: “Why do most diffusion language models feel fast in papers …