LOGO

Google SIMA 2 Agent: Gemini Powers Virtual World Interaction

November 13, 2025
Google SIMA 2 Agent: Gemini Powers Virtual World Interaction

Google DeepMind Unveils SIMA 2: A Next-Generation AI Agent

On Thursday, Google DeepMind presented a research preview of SIMA 2, its latest generalist AI agent. This new iteration integrates the capabilities of Gemini, Google’s advanced large language model, to enhance both language processing and reasoning abilities.

Building on the Success of SIMA 1

Similar to DeepMind’s AlphaFold project, the initial SIMA version underwent training using extensive video game data. This allowed it to learn how to play various 3D games, even those it hadn't encountered during training, mimicking human-like gameplay.

SIMA 1, introduced in March 2024, demonstrated the ability to follow instructions across diverse virtual environments. However, its success rate for complex tasks was 31%, significantly lower than the 71% achieved by humans.

Significant Improvements with SIMA 2

“SIMA 2 represents a substantial advancement in capabilities compared to its predecessor,” stated Joe Marino, a senior research scientist at DeepMind, during a press briefing.

He further explained that SIMA 2 is a more versatile agent, capable of completing intricate tasks in previously unknown environments. Crucially, it’s also a self-improving agent, learning and refining its performance based on its own experiences.

Powered by Gemini 2.5 Flash-Lite

SIMA 2 leverages the Gemini 2.5 flash-lite model for its operations. Artificial General Intelligence (AGI), as defined by DeepMind, refers to a system’s capacity to handle a broad spectrum of intellectual challenges, including learning new skills and applying knowledge across different domains.

The Importance of Embodied Agents

DeepMind researchers emphasize the importance of working with “embodied agents” for achieving generalized intelligence.

An embodied agent interacts with a physical or virtual world through a body, observing inputs and taking actions – much like a robot or a human. This contrasts with non-embodied agents that might manage calendars, take notes, or execute code.

Beyond Gameplay: Understanding and Common Sense

Jane Wang, a senior staff research scientist at DeepMind specializing in neuroscience, highlighted that SIMA 2’s capabilities extend far beyond simply playing games.

“We are challenging it to truly comprehend the situation, understand the user’s requests, and respond in a way that demonstrates common sense – a task that is inherently difficult,” Wang explained.

Doubling Performance Through Gemini Integration

Integrating Gemini has effectively doubled the performance of SIMA 2. This unification combines Gemini’s sophisticated language and reasoning skills with the embodied abilities honed through training.

Demonstrations in Virtual Worlds

Marino showcased SIMA 2 within “No Man’s Sky,” where the agent accurately described its surroundings – a rocky planetary surface – and identified a distress beacon, subsequently interacting with it.

The agent also utilizes Gemini for internal reasoning. When tasked with locating a house colored like a ripe tomato, SIMA 2 demonstrated its thought process – recognizing that ripe tomatoes are red, and therefore seeking a red house – before successfully finding and approaching the correct structure.

Emoji-Based Instructions and Photorealistic Environments

SIMA 2’s Gemini integration allows it to interpret instructions conveyed through emojis. “You can instruct it 🪓🌲, and it will proceed to chop down a tree,” Marino noted.

Furthermore, SIMA 2 can effectively navigate newly generated, photorealistic worlds created by Genie, DeepMind’s world model, correctly identifying and interacting with objects like benches, trees, and butterflies.

Self-Improvement Through AI-Generated Training

Gemini also facilitates self-improvement with minimal human data. While SIMA 1 relied entirely on human gameplay data, SIMA 2 uses this as a starting point.

When introduced to a new environment, the agent benefits from tasks generated by another Gemini model and a separate reward model that evaluates its performance. This self-generated experience serves as training data, allowing the agent to learn from its errors and improve through trial and error, guided by AI-based feedback.

Towards More General-Purpose Robotics

DeepMind envisions SIMA 2 as a crucial step towards developing more versatile robots.

“To perform tasks in the real world, like a robot, a system requires two key components,” explained Frederic Besse, a senior staff research engineer at DeepMind.

“First, a high-level understanding of the world and the task at hand, along with reasoning capabilities. SIMA 2 primarily focuses on this high-level behavior, rather than the lower-level actions involved in controlling physical components.”

Future Developments and Collaboration

The DeepMind team did not disclose a specific timeline for integrating SIMA 2 into physical robotics systems. They clarified that their recently unveiled robotics foundation models were trained using different methods than SIMA.

While a wider release of SIMA 2 beyond the current preview is not yet scheduled, Wang expressed the intention to showcase DeepMind’s progress and explore potential collaborations and applications.

#SIMA 2#Gemini#Google AI#virtual worlds#AI agent#reasoning