Mind-Blowing: A Chinese Mega-Team Just Dropped Inferix — The Inference Engine That Turns “World Simulation” From Sci-Fi Into Reality

You thought 2025 was already wild? Hold my coffee.

On November 24, 2025, a joint force from Zhejiang University, HKUST, Alibaba DAMO Academy, and Alibaba TRE quietly released something that will be remembered as the real turning point of AI video: 「Inferix」.

It’s not another video generation model.
It’s the dedicated inference engine for the next era — the 「World Model era」.

In plain English:
「Inferix lets normal GPUs run minute-long, physics-accurate, fully interactive, never-collapsing open-world videos — in real time.」

This is the closest thing we’ve ever had to “The Matrix simulator” running on consumer hardware.

The Insane Benchmark That Broke My Brain

Generating a 5-second 1080p video with the monster-grade Wan2.1 14B model on a single NVIDIA H20 (roughly 70% of a 4090):

  • Classic diffusion pipeline: 「6,800 seconds」 (~113 minutes of pure suffering)
  • With Inferix + Block-Diffusion paradigm: 「30–60× faster」, and it can keep going forever without quality collapse.

Oh, and it can stream the video live while generating — like watching an AI director shoot a movie in real time.

OK, But What the Heck Is “Block-Diffusion”?

There are two old camps in video generation:

  1. Pure Diffusion (Sora-style)
    → Looks god-tier for short clips
    ↓ Collapses horribly after ~10 seconds, zero memory, insanely expensive

  2. Pure Autoregressive (some MovieGen-style attempts)
    → Can generate forever
    ↓ Looks like cheap CGI from 2005

Then 2025 gave us the ultimate hybrid: 「Block-Diffusion (a.k.a. Semi-Autoregressive)」

How it works, explained like you’re five:

  • Chop the video into small blocks (e.g., 1–2 seconds each)
  • Inside each block → use diffusion for maximum visual fidelity
  • Between blocks → use autoregressive + LLM-style KV cache to “remember” everything that happened before

Result?
Diffusion-level beauty + infinite coherent storytelling.

The only problem: none of the existing inference engines (vLLM, SGLang, xDiT, etc.) were built for this new beast.

So the team built 「Inferix」 from the ground up as its perfect weapon.

Inferix’s Six Killer Features (Steal This List)

  1. 「Block-wise KV Cache」
    The AI remembers 100+ seconds of context. Characters never change faces. Backgrounds never melt. Physics stays consistent.

  2. 「Real-time Streaming」
    RTMP + WebRTC support. Watch the AI world being built live, like a Twitch stream from another dimension.

  3. 「Mid-Generation Prompt Switching」
    Type “now the hero pulls out a lightsaber” at the 2-minute mark → the world instantly adapts. Zero continuity break.

  4. 「Insane Optimization Stack」
    INT8/FP8 quantization, DAX, Ulysses sequence parallelism, Ring Attention — runs models on 8 GPUs that used to need 80.

  5. 「LV-Bench Included」
    A brutal new benchmark with 1,000 real minute-plus videos specifically designed to expose “does it collapse after 60 seconds?” Most models fail miserably. Now we have the ultimate leaderboard.

  6. 「Plug-and-Play for Top Models」
    Out-of-the-box support for MAGI-1, CausVid, Self Forcing — copy-paste and run demos in minutes.

What This Actually Means for You in 2026

Imagine owning a 5090 and typing:

“Cyberpunk city, pouring rain, a man in a trench coat chasing a flying car, jumps on the roof, blade-runner-style cinematography, 30-minute uncut movie”

Then you sit back with popcorn while an AI director live-shoots a full-length cinematic masterpiece — and you can steer the story in real time.

That’s not “text-to-video” anymore.
That’s 「world creation」.

Final Reality Check

  • Sora showed us AI can make video
  • Kling showed us AI can make beautiful video
  • Inferix + Block-Diffusion is showing us AI can simulate entire worlds

The paper dropped three days ago and GitHub is already on fire (10k+ stars and climbing).

Links you need right now:

  • GitHub: https://github.com/inferix/inferix
  • Paper: https://arxiv.org/abs/2511.20714
  • LV-Bench details: inside the repo

2025 isn’t even over and the game has already changed forever.

See you in the simulation,
— Your fellow tech degenerate who just lost his mind

#AI #WorldModel #Inferix #BlockDiffusion #VideoGeneration #AGI