Evolution Strategies Go Hyperscale: How EGGROLL Trains Billion-Parameter Models Without Gradients A plain-language walkthrough of the paper “Evolution Strategies at the Hyperscale” Written for college-level readers who want facts, not fluff Word count: ≈ 3 200 1. Why should I care about “gradient-free” training? Because back-propagation is not always the best tool. Situation Why gradients struggle Model uses int8 weights only Tiny round-off errors explode during backward pass System contains non-differentiable code (hash table, cellular automaton, database call) Chain rule breaks Very long recurrent loops Vanishing/exploding signal You already own a huge inference cluster GPUs sit idle while you wait …