Xiaomi MiMo-V2-Flash: Deep Dive into the 309B Parameter Efficient AI Model Summary: Xiaomi’s MiMo-V2-Flash is a Mixture-of-Experts language model featuring 309B total parameters with only 15B active parameters, achieving 6× KV cache compression through 128-token sliding window attention, reaching 73.4% resolution rate on SWE-Bench Verified, delivering 2.6× inference speedup, making it the most efficient open-source code agent model available today. Why Are AI Models Getting Slower Despite Growing Larger? When using ChatGPT or other AI assistants, you might notice an intriguing paradox: models keep getting more powerful, yet response times don’t seem to improve proportionally. What’s behind this phenomenon? Xiaomi’s …