SIMA 2: How Gemini-Powered AI is Revolutionizing 3D Virtual Worlds

高效码农

2 months ago

SIMA 2: A Gemini-Powered AI Agent That Interacts, Reasons, and Evolves in 3D Virtual Worlds

On November 13, 2025, DeepMind unveiled SIMA 2—a next-generation AI agent that marks a pivotal advancement in the application of artificial intelligence within 3D virtual environments. As an upgraded version of SIMA (Scalable Instructable Multiworld Agent), SIMA 2 transcends simple instruction-following. By integrating the robust capabilities of the Gemini model, it has evolved into an interactive gaming companion capable of thinking, communicating, and self-improving. This breakthrough not only pushes the boundaries of game AI but also provides valuable insights for the development of Artificial General Intelligence (AGI) and robotics.

From Instruction-Following to Proactive Reasoning: SIMA 2’s Core Breakthroughs

The first iteration of SIMA already demonstrated the ability to execute basic instructions across virtual environments. It could perform over 600 language-guided actions—such as “turn left,” “climb the ladder,” and “open the map”—and operated like a human player: “observing” the screen and using a virtual keyboard and mouse to navigate, without accessing underlying game mechanics. However, its key limitation was that it could only respond passively to commands, lacking the capacity for proactive thinking.

SIMA 2’s revolutionary transformation stems from embedding the Gemini model as its architectural core. This shift turns it from an “executor” into a “thinker”—one that can understand users’ high-level goals, engage in complex reasoning to pursue those goals, and execute goal-oriented actions skillfully within games.

For example, if a user says, “We need to build a safe shelter,” the original SIMA might struggle with abstract concepts like “safe” or “shelter.” In contrast, SIMA 2 would analyze first: “Safety means protection from external threats, possibly requiring high walls. A shelter needs an enclosed space, so we first need to gather materials like wood or stone.” It would then outline steps methodically: collect materials, construct a framework, and reinforce the structure. It might even explain its process to the user: “I’m going to cut down trees first because wood is lightweight and ideal for framing. Later, we’ll find stones to strengthen the outer walls.”

This reasoning capability is made possible by a unique training approach: combining human demonstration videos with language labels, supplemented by Gemini-generated labels. As a result, SIMA 2 can not only act but also “explain” what it intends to do and why. Many testers have noted that interacting with SIMA 2 feels more like collaborating with a thinking partner than issuing commands to a machine.

[Insert image: Screenshot of SIMA 2 analyzing a task and explaining its steps in a game]

A Leap in Generalization: Adapting Across Games and Even Unknown Worlds

“Generalization”—the ability of an AI to transfer skills learned in one scenario to new ones—is a critical measure of intelligence. SIMA 2 has made remarkable strides in this area, again thanks to the integration of Gemini.

Handling Complex Instructions and Multimodal Prompts

While the original SIMA excelled at simple commands, SIMA 2 can comprehend longer, more intricate tasks. For instance, it can break down a multi-step, spatially complex instruction like “First, go to the eastern mountain top to find the red flag, note the symbols on the stone tablet next to it, then return to the camp and use these symbols to open the treasure chest” into a clear sequence of actions.

More importantly, it understands “multimodal prompts”—meaning instructions don’t have to rely solely on text. If a user sends a screenshot of a map with a circled target and the message “Get the key here,” SIMA 2 can combine image and text information to act. Even emojis like “🌾🔨” (grain and hammer) are understandable: it interprets this as “Harvest grain, then use a hammer to repair tools.”

Cross-Language and Cross-Game Transfer

SIMA 2 supports interaction in multiple languages, enabling users to issue commands in English, Chinese, or other languages with precise understanding. What’s more impressive is its mastery of “concept transfer”: for example, skills learned from “mining” (using tools to extract underground resources) in one game can be applied to “harvesting” (using tools to collect plant resources) in another. It quickly recognizes the common thread—”using specific tools to obtain target resources from the environment”—and adapts accordingly.

This capability allows it to perform exceptionally well in games it was never trained on. For example, in ASKA, a new Viking survival game, it rapidly grasps Norse-style construction logic. In MineDojo—a research implementation of the popular open-world sandbox game Minecraft—it transfers “building” and “crafting” skills from other sandbox games to complete tasks like constructing houses and making tools.

Testing Limits in “Imagined Worlds”

To push the boundaries of SIMA 2’s generalization, the research team paired it with another groundbreaking project: Genie 3. Genie 3 can generate new, real-time 3D virtual worlds from a single image or text prompt. For example, inputting “a crystal castle floating in the clouds” prompts Genie 3 to create a corresponding interactive world.

In these completely unfamiliar “imagined worlds,” SIMA 2 still excels at:

Rapidly orienting itself (“I’m at the castle gate, and there’s a drawbridge in front of me”);
Understanding user instructions (“Go to the top floor of the castle to find the glowing gem”);
Taking logical actions (“The drawbridge is down—I’ll cross it first, then look for stairs to go upstairs”).

This adaptability to unknown environments closely mirrors how human players learn new games.

[Insert image: Demo of SIMA 2 acting in a virtual world generated by Genie 3]

Self-Improvement: Evolving from “Taught” to “Self-Learning”

One of SIMA 2’s most exciting features is its ability to continuously improve through self-learning—shifting AI from “passive training” to “active growth.”

The Complete Self-Improvement Cycle

This process functions as a closed loop of “learning-practice-feedback-relearning”:

「Initial Task and Reward Estimation」: Gemini first assigns SIMA 2 a task (e.g., “Build a monster-resistant wooden house in Valheim”) and estimates a “reward” for task completion (e.g., “Wooden house structural integrity of 80% or higher”);
「Autonomous Attempts and Data Accumulation」: SIMA 2 independently attempts the task in the game. All behavioral data—including successful steps and failed attempts—is recorded in a “self-generated experience bank”;
「Training with Self-Generated Experience」: This experiential data is used to train the next version of SIMA 2, allowing it to learn techniques from its successes and lessons from its failures;
「Iterative Improvement」: The new version reattempts the task, typically performing better and generating higher-quality experiential data, perpetuating the cycle.

For example, initially, SIMA 2 might not know how to efficiently transport resources in Satisfactory. After several attempts, it records insights like “Conveyor belts are faster than manual carrying” and “Conveyor belts need to be placed at an appropriate slope to avoid jams.” In subsequent attempts, it applies these lessons directly—even optimizing more complex transportation networks.

Progress Without Human Data

Crucially, this cycle can operate entirely without human data. After an initial learning phase using human demonstrations, SIMA 2 can improve its skills in new games through “self-directed play” alone. For instance, in Goat Simulator 3— a game it had never encountered—it independently discovers unique mechanics like “using its head to break specific obstacles” and “leveraging environmental physics to move objects” without additional human demonstration videos.

It can even initiate self-improvement cycles in brand-new worlds generated by Genie 3—a major milestone toward training general AI across diverse, generated environments.

Future Outlook: The Path to Embodied Intelligence from Virtual to Real

SIMA 2’s advancements in gaming environments extend far beyond enhancing gameplay. 3D virtual worlds serve as ideal “training grounds” for general intelligence—offering complex physical rules, diverse task objectives, and rich interactive elements. These environments allow AI to practice core skills in a safe, controlled setting.

Current Limitations

Despite its progress, SIMA 2 still faces key challenges:

「Long-Horizon Task Difficulty」: For ultra-complex tasks requiring dozens or even hundreds of reasoning steps (e.g., “Building an interplanetary trade network in No Man’s Sky”), it may lose direction midway;
「Memory Constraints」: Its memory of past interactions relies on a limited “context window,” meaning it may forget earlier conversations or actions;
「Fine Motor Skills and Visual Understanding」: Executing high-precision actions via keyboard and mouse (e.g., accurately cutting objects in Teardown) remains challenging. Similarly, visually interpreting complex 3D scenes (e.g., quickly identifying key items in cluttered environments) requires further improvement.

These limitations are not unique to SIMA 2—they are ongoing challenges for the entire field of embodied AI.

Implications for Robotics

The skills SIMA 2 has learned—navigation, tool use, and collaborative task execution—are core capabilities required for robots in the physical world. For example, the logic it mastered in virtual environments for “tightening screws with a wrench” (“Locate the screw → Grab the wrench → Align → Rotate”) could potentially transfer to real-world robots, enabling them to perform similar tasks in homes or factories.

In essence, SIMA 2’s exploration is building a bridge from “virtual intelligence” to “real-world intelligence.”

Responsible Development: Balancing Technological Progress and Risk Mitigation

As an interactive, self-improving AI, SIMA 2’s development has been guided by a strong commitment to responsibility and safety. Throughout the development process, DeepMind collaborated closely with its Responsible Development & Innovation Team to ensure technological progress does not outpace oversight.

Currently, SIMA 2 is in a “limited research preview” phase, with early access granted only to a small group of academics and game developers. This approach aims to:

Gather feedback from diverse fields to assess performance across scenarios;
Identify potential risks (e.g., inappropriate behavior in open-world environments);
Explore effective risk mitigation strategies.

This “iterative, collaborative” model ensures technology evolves within controlled boundaries, ultimately delivering positive societal value.

Frequently Asked Questions (FAQ)

What’s the fundamental difference between SIMA 2 and the original SIMA?

The original SIMA was an “instruction executor,” capable only of completing actions based on explicit commands. SIMA 2 is a “reasoning collaborator”—it understands abstract goals, proactively plans steps, explains its actions, and improves through self-learning. The core difference lies in the integration of Gemini’s reasoning capabilities and a self-improvement mechanism.

What games can SIMA 2 play?

It has been trained and tested on a range of commercial games, including Valheim, Satisfactory, Goat Simulator 3, Hydroneer, No Man’s Sky, Space Engineers, Wobbly Life, Eco, ASKA, The Gunk, Steamworld Build, Road 96, and Teardown. It also adapts to research-focused games like MineDojo and even entirely new virtual worlds generated by Genie 3.

Could its “self-improvement” lead to loss of control?

Not currently. Its self-improvement process operates within clear task frameworks and reward mechanisms. Additionally, as it remains in a limited preview phase, it is subject to ongoing human oversight and risk assessment. Developers are also designing more robust control mechanisms to ensure safe technological advancement.

How close is SIMA 2 to Artificial General Intelligence (AGI)?

It represents a significant step toward AGI, but gaps remain. AGI requires comprehensive understanding, reasoning, and learning capabilities across a wide range of physical and virtual environments. Currently, SIMA 2 focuses primarily on 3D virtual worlds and faces challenges in long-horizon tasks and cross-domain knowledge transfer.

When will regular players be able to experience SIMA 2?

It is currently in the research phase, with access limited to academics and developers. As the technology matures and risk assessments are completed, access may expand gradually, though no specific timeline has been announced.

Conclusion

SIMA 2’s launch showcases the transformation of AI in 3D virtual worlds—from a “tool” to a “companion.” It not only understands commands but also thinks, communicates, and evolves. This capability stems from the combination of Gemini’s reasoning power and advancements in embodied AI research.

Despite its limitations, SIMA 2 validates a critical path: integrating powerful language models with multi-environment training data can create general-purpose intelligent agents with both breadth and depth. From in-game collaboration to future robots serving in the physical world, SIMA 2’s exploration is paving the way for the “embodiment” of artificial intelligence. This journey proceeds with a commitment to responsibility, ensuring technological progress truly serves humanity.