Marble: Building 3D Worlds with Multimodal AI
Imagine you’re sketching out a room in your mind—a cozy kitchen with sunlight streaming through the windows, or a vast museum filled with abstract sculptures. What if you could turn that mental image into a fully navigable 3D space, tweak it on the fly, and even export it for a game or film? That’s the promise of Marble, a tool from World Labs that’s pushing the boundaries of how we create and interact with digital environments. As someone who’s spent years diving into AI systems for spatial design, I’ve seen how these models can bridge the gap between imagination and reality. In this post, we’ll walk through what Marble is, how it works, and why it might change the way you think about building virtual worlds. We’ll keep things straightforward, with examples and steps you can follow if you’re ready to try it.
Marble isn’t just another generator; it’s a multimodal world model designed to handle inputs like text, images, or even rough 3D sketches and turn them into coherent 3D scenes. Think of it as a digital architect that listens to multiple “senses” at once—much like how we humans piece together our surroundings from sights, words, and touches. If you’re wondering, “How does a world model differ from a simple image generator?” it’s in the depth: while image tools stop at a flat picture, world models build out the full geometry, lighting, and layout so you can walk around in it virtually.
Let’s break it down step by step. We’ll start with the basics, move into practical ways to use it, and end with some real-world applications and tips.
What Is a Multimodal World Model?
At its core, a multimodal world model like Marble takes in different types of information—text descriptions, photos, videos, or basic 3D shapes—and weaves them into a single, interactive 3D world. Why multimodal? Because real life doesn’t come in one flavor. You might describe a scene in words (“a quiet forest path at dawn”), snap a photo of a tree, and add a video clip of rustling leaves. Marble pulls all that together without you having to manually align everything.
This approach mimics how our brains work: integrating signals to form a richer picture. For instance, if you’re designing a virtual space for a robotics simulation, you need more than visuals—you need structure that agents can “navigate” reliably. Marble’s output is a 3D world that’s not just pretty but functional, ready for export to tools like game engines or physics simulators.
Here’s a quick comparison to help you see the shift:
If you’re asking, “Is this overkill for a hobbyist?” Not at all. Marble’s interface is intuitive enough for beginners, but it scales for pros who need precision.
This diagram shows how inputs flow into Marble: from simple prompts to complex edits, all leading to a unified 3D output.
Getting Started with Text and Image Prompts
The easiest entry point is turning a text description or a single image into a 3D world. It’s like giving Marble a starting point and letting it fill in the blanks—lighting, shadows, even unseen angles.
Step-by-Step: Creating a World from Text
-
Sign Up and Access: Head to marble.worldlabs.ai and create an account. It’s free to start, with options for more advanced use.
-
Enter Your Prompt: Type something descriptive, like “A modern art museum with wooden floors and colorful abstract sculptures.” Keep it vivid but concise—aim for 50-100 words to guide the style without overwhelming the model.
-
Generate: Hit create. Marble processes this in seconds to minutes, depending on complexity, and renders a 3D view you can orbit around.
-
Explore: Use the built-in viewer to pan, zoom, and check details. If it misses something (say, the sculptures feel too uniform), that’s where editing comes in later.
Text prompts shine for quick ideation. For example, “A serene Scandinavian guesthouse bedroom with glacier views” yields a room that’s not just flat but has depth—windows framing icy peaks, furniture scaled realistically.
From Images to Immersive Worlds
Got a photo from your phone? Upload it as a prompt. Marble extrapolates the rest: if you feed it a snapshot of a cafe interior, it builds out the counter’s depth, adds tables in perspective, and ensures the lighting matches across views.
This pairs well with other tools. Generate an image in another AI app first, then import it here. Result? A 3D scene that’s a natural extension of your 2D sketch.
Users often ask, “What if the output doesn’t match my vision exactly?” That’s common with any generative tool. The magic is in iteration—more on that below.
Gaining Control with Multi-Image and Video Inputs
For more precision, step up to multi-image prompts. Instead of one view, provide several—like front, side, and back angles of a room. Marble stitches them seamlessly, filling gaps with consistent details.
How to Use Multi-Image Prompts
-
Gather Your Inputs: Collect 3-6 images. They don’t need to be perfect; even rough sketches work. For real-world inspiration, snap photos of a space from different spots.
-
Upload and Align: In Marble, select the multi-image mode. Drag your files in and let the tool suggest alignments (it auto-detects overlaps).
-
Add Guidance: Optionally, include a text note like “Blend into a cozy library with warm lighting.”
-
Generate and Refine: Output is a cohesive 3D world. Walk through it to spot inconsistencies, then edit.
This method is great for workflows in design or VFX. Imagine prototyping a film set: generate varied views in an image tool, then lift to 3D here. Or, for robotics, use video clips of a factory floor to create a navigable sim.
Videos add motion context. A short clip of walking through a park? Marble captures the path’s curve and foliage density, turning it into a static-yet-dynamic 3D model.

(Example: Turning varied photos into a unified 3D scene.)
One practical tip: If your images have artifacts (blurry edges from phone cams), Marble smooths them out during generation.
Editing Your Worlds: Iterative Creativity
Generation is just the spark; editing is where ideas evolve. Marble’s tools let you tweak without starting over—remove a distracting lamp, swap wood floors for tile, or overhaul the lighting.
Common Edits and How to Do Them
-
Local Changes: Select an area (like a wall) and prompt: “Turn this into a stage with spotlights.” Marble regenerates just that spot.
-
Global Shifts: For the whole scene, say “Change all counters to black granite.” It applies consistently.
-
Object Swaps: Highlight an item and describe a replacement: “Swap tables for benches facing forward.”
Steps for any edit:
-
Enter Edit Mode: After generation, click the edit icon.
-
Select Target: Use lasso or box tools in 2D/3D view.
-
Describe Change: Text prompt, keeping it specific (e.g., “Add velvet curtains, deep red”).
-
Preview and Apply: See a side-by-side, then commit.
This keeps the creative flow going. I’ve found it addictive—start with a basic room, edit in elements inspired by a mood board, and suddenly you’ve got a custom set for a story.

(Before and after: Transforming a wall into a performance space.)

(Updating surfaces for a modern look.)
Chisel: Hands-On 3D Sculpting
Need even more control over layout? Enter Chisel, Marble’s experimental mode for building structure first, style second. It’s like blocking out a scene with Lego before painting it.
Building with Chisel: A Walkthrough
-
Set Up the Skeleton: In Chisel mode, add basic shapes—cubes for rooms, planes for walls. Or import simple 3D assets (like a chair model from a free library).
-
Position Elements: Drag to arrange. Scale a box to fit a hallway; rotate a plane for a slanted roof.
-
Layer in Style: Add a text prompt: “Fill with colorful paintings on white walls, wooden floors.” Marble dresses the structure accordingly.
-
Iterate: Tweak the layout, regenerate style. Export when ready.
Chisel separates “what goes where” from “how it looks.” A simple box layout + “modern art museum” prompt yields galleries with sculptures hugging the walls. Change the prompt to “Scandinavian guesthouse,” and it’s the same bones but with cozy textiles and glacier vistas.

(Coarse layout turned into a vibrant museum space.)

(Same structure, different serene style.)
For advanced users, this is gold for precise sims—ensure a robot path is exact before styling.
Expanding and Composing for Bigger Visions
Small scenes are fun, but what about epic scales? Marble lets you grow worlds outward.
Expanding a Scene
-
Select Area: Mask a region (e.g., beyond a door).
-
Prompt Expansion: “Extend into a garden with paths and benches.”
-
Generate: Fills seamlessly, fixing low-res spots too.
This cleans up edges and adds detail where needed, like sharpening a distant corner.
Composing Multiple Worlds
For massive builds:
-
Generate Pieces: Create separate worlds (a train car, platform, landscape).
-
Enter Composer Mode: Arrange them spatially—align doors, scale proportionally.
-
Merge: Marble blends transitions for a fluid mega-scene.
Perfect for games: Compose rooms into a mansion, or environments into a city block.

(Growing a room into a larger traversable area.)

(Assembling parts into a full-scale train.)
Exporting Your Creations: From Concept to Production
Once built, get your world out into the world. Options match common pipelines.
Export Formats
-
Gaussian Splats: High-fidelity particles for smooth rendering. Use Spark (open-source with THREE.js) for browser playback.
-
Meshes:
-
Collider: Low-poly for physics (e.g., collision detection). -
High-Quality: Detailed visuals for rendering.
-
-
Videos: Render walkthroughs with exact camera paths. Enhance for motion—like adding flowing water or flickering fire—while keeping structure intact.
Steps for video export:
-
Set Camera: Keyframe paths in the viewer.
-
Render: Choose resolution and length.
-
Enhance (Optional): Prompt “Add subtle wind to trees” for dynamics.

(Gaussian splats and meshes side-by-side.)

(Before/after: Static scene gains smoke and flames.)
These make Marble versatile—drop a mesh into Unity, or share a video for feedback.
Marble Labs: Inspiration and Community
Beyond solo creation, Marble Labs is a space to see others’ work. It’s packed with case studies: a filmmaker’s VFX workflow, a designer’s therapeutic VR room, robotics sims for path planning. Tutorials cover basics to advanced Chisel tricks.
Browse for ideas, then adapt. One case: Using expansions for interactive game levels, ensuring seamless player movement.
If you’re thinking, “How do I contribute?” Share your worlds there—it’s a hub for feedback and collaboration.
Toward Spatial Intelligence: What’s Next?
Marble handles creation today, but the horizon is interaction. Future updates will let agents (AI or human) navigate these worlds dynamically—for training robots or testing designs. It’s a step toward AI that truly “understands” space, like predicting how light shifts in a room over time.
For now, it’s empowering: artists prototype sets, engineers sim environments, anyone builds personal spaces.
How to Get the Most Out of Marble: Quick Tips
-
Start Simple: Text prompts for speed, multi-image for control. -
Iterate Often: Edit early to steer direction. -
Combine Tools: Pair with your fave image gen for hybrid workflows. -
Test Exports: Always preview in target software. -
Scale Smart: Compose for big projects, expand for details.
FAQ: Answering Common Questions About Marble
What inputs does Marble accept for world generation?
Marble works with text descriptions, single or multiple images, short videos, and coarse 3D layouts via Chisel. This multimodal setup lets you mix them for tailored results.
How long does it take to generate a 3D world?
From seconds for simple text prompts to a few minutes for complex multi-image or Chisel builds. Processing time scales with detail level.
Can I use Marble for professional projects like games or films?
Yes—exports as meshes and videos integrate with tools like Unity or After Effects. Case studies in Marble Labs show VFX and gaming workflows.
Is there a learning curve for editing worlds?
Basic edits are prompt-based and intuitive, like describing changes. Chisel adds 3D sculpting for pros, with tutorials to ramp up quickly.
What if my generated world has inconsistencies?
Use targeted edits or expansions to fix spots. Multi-image inputs reduce inventions, keeping outputs closer to your intent.
How-To: Building Your First Editable 3D World
Sign up at marble.worldlabs.ai and log in.
Choose “Text to World” and enter: “A quiet cafe with wooden tables and large windows.”
Generate, then enter edit mode: Select a table and prompt “Replace with a bookshelf full of books.”
Expand the back wall: Mask it and add “Open to a small garden patio.”
Export as a video: Set a camera path walking through, enhance with “Gentle steam from coffee cups.”
Share in Marble Labs for feedback.
Ready to build? Jump into marble.worldlabs.ai and experiment. If this sparks ideas for your work—whether design, sims, or just fun—drop a note in the Labs. What’s the first world on your list?

