X-Omni Explained: How Reinforcement Learning Revives Autoregressive Image Generation A plain-English, globally friendly guide to the 7 B unified image-and-language model 1. What Is X-Omni? In one sentence: X-Omni is a 7-billion-parameter model that writes both words and pictures in the same breath, then uses reinforcement learning to make every pixel look right. Key Fact Plain-English Meaning Unified autoregressive One brain handles both text and images, so knowledge flows freely between them. Discrete tokens Images are chopped into 16 384 “visual words”; the model predicts the next word just like GPT predicts the next letter. Reinforcement-learning polish After normal training, …