Powered by Smartsupp

Google DeepMind Launches Project Genie: Create Interactive Game Worlds from Text or Images



By admin | Jan 29, 2026 | 7 min read


Google DeepMind Launches Project Genie: Create Interactive Game Worlds from Text or Images

Google DeepMind is now providing access to Project Genie, an AI research prototype designed to build interactive game environments from simple text descriptions or images. Beginning Thursday, subscribers to Google AI Ultra in the United States can experiment with the tool, which leverages a combination of DeepMind’s latest world model, Genie 3, its image generator Nano Banana Pro, and the Gemini system. This release follows Genie 3’s initial research preview five months ago and represents a strategic effort to collect user input and training data as the lab accelerates its development of more advanced world models.

These world models are AI systems that construct an internal simulation of an environment, enabling them to forecast future events and strategize actions. Many prominent AI researchers, including teams at DeepMind, consider world models a foundational component in the pursuit of artificial general intelligence (AGI). In the shorter term, however, labs like DeepMind plan to introduce this technology first through video games and entertainment, later expanding its use to training embodied agents, such as robots, within simulated settings.

The launch of Project Genie arrives as competition intensifies in the world model arena. Last year, Fei-Fei Li’s World Labs introduced its first commercial product, Marble. The AI video generation startup Runway has also recently debuted a world model, and AMI Labs, a startup from former Meta chief scientist Yann LeCun, will similarly concentrate on developing this technology. The current prototype can be unpredictable, at times producing remarkably playable worlds and at other times delivering confusing or off-target results.

Here is how the process works.

A claymation-style castle in the sky made of marshmallows and candy.Image Credits:TechCrunch

It begins with a “world sketch,” where you input text prompts describing both the setting and a main character. You will later guide this character through the world in either first- or third-person view. Nano Banana Pro generates an image from these prompts, which you can theoretically adjust before Genie uses it as the foundation for an interactive world. While these modifications generally functioned, the model occasionally faltered—for instance, sometimes rendering purple hair when green was requested. You may also use real photographs as a starting point, though this method yielded mixed success.

Once the image is finalized, Project Genie requires just a few seconds to create an explorable world. Users can also remix existing worlds by modifying their original prompts or browse curated examples in a gallery or through a randomizer tool for inspiration. After exploring, you can download a video of the generated world. Currently, DeepMind is limiting sessions to 60 seconds of world generation and navigation, partly due to budget and computational constraints. Since Genie 3 is an auto-regressive model, it demands significant dedicated computing power, which restricts how much capacity DeepMind can allocate to users.

“We set the 60-second limit because we wanted to make it accessible to more people,” explained a DeepMind representative. “Essentially, when you use it, a specific chip is dedicated solely to your session.” They added that extending the time beyond 60 seconds would offer diminishing returns for testing, noting, “The environments are engaging, but their level of interaction and dynamism is currently somewhat limited. We view this as a constraint we aim to improve.”

Whimsy works, realism doesn’t

Google received a cease-and-desist from Disney last year, so it wouldn’t build models that were Disney-related.Image Credits:TechCrunch

During testing, safety guardrails were already active. The system blocked the generation of any content resembling nudity or worlds that hinted at Disney or other copyrighted material. This follows a cease-and-desist letter Disney sent to Google in December, alleging that its AI models infringed copyright by training on Disney characters and intellectual property to produce unauthorized content. Attempts to create worlds featuring mermaids in underwater realms or ice queens in winter castles were also unsuccessful.

Nevertheless, the demonstration proved deeply impressive. The first world created aimed to fulfill a childhood fantasy: exploring a castle in the clouds made of marshmallows, with a chocolate sauce river and candy trees. Requesting a claymation style, the model produced a whimsical world that perfectly captured that imaginative vision, complete with pastel-and-white spires that looked enticingly edible.

A “Game of Thrones” inspired world that failed to generate as photo-realistically as I wanted.Image Credits:TechCrunch

That said, Project Genie still has room for refinement. The models performed exceptionally well with artistic prompts, such as watercolor, anime, or classic cartoon styles. However, they often struggled with photorealistic or cinematic worlds, frequently producing results that resembled video game graphics rather than authentic settings. Using real photos as a baseline also yielded inconsistent outcomes.

When provided a photo of an office and asked to replicate it exactly, the generated world included similar furnishings—a wooden desk, plants, a grey couch—but arranged differently, with a sterile, digital appearance. In another test, a photo of a desk with a stuffed toy prompted Genie to animate the toy navigating the space, with other objects occasionally reacting to its movement. This interactivity is an area DeepMind is actively working to enhance. There were several instances where characters phased straight through walls or solid objects.

I asked Project Genie to animate a stuffed toy (Bingo Bronson) so it could explore my desk. Image Credits:TechCrunch

When DeepMind first introduced Genie 3, researchers emphasized that its auto-regressive architecture allows it to remember previously generated content. Testing this feature by revisiting parts of an environment showed that the model largely succeeded in maintaining consistency. In one scenario featuring a cat exploring a desk, the model only once generated an extra mug upon returning to a previously viewed section.

The most frustrating aspect involved navigation controls: using arrow keys to look around, the spacebar to jump or ascend, and W-A-S-D keys to move. For non-gamers, these controls did not feel intuitive, often proving unresponsive or causing movement in unintended directions. Moving from one side of a room to a doorway frequently devolved into chaotic zigzagging, akin to steering a shopping cart with a broken wheel.

A DeepMind representative acknowledged these shortcomings, reiterating that Project Genie remains an experimental prototype. Looking ahead, the team aims to boost realism and improve interaction capabilities, including granting users greater control over actions and environments. “We don’t see Project Genie as a final, everyday product, but we believe it already offers a glimpse of something unique and compelling that can’t be achieved through other means,” they said.




Comments

Please log in to leave a comment.

No comments yet. Be the first to comment!