MixGRPO: Train Text-to-Image Models 71 % Faster—Without Sacrificing Quality Plain-English summary MixGRPO replaces the heavy, full-sequence training used in recent human-preference pipelines with a tiny, moving window of only four denoising steps. The trick is to mix deterministic ODE sampling (fast) with stochastic SDE sampling (creative) and to let the window slide from noisy to clean timesteps. The result: half the training time of DanceGRPO and noticeably better pictures. Why Training “Human-Aligned” Image Models Is Painfully Slow Recent breakthroughs show that diffusion or flow-matching models produce far more pleasing images if you add a Reinforcement-Learning-from-Human-Feedback (RLHF) stage after the base …