From a Sentence to a Walkable 3D World
A Practical Guide to Tencent HunyuanWorld 1.0
“To see a world in a grain of sand, and heaven in a wild flower.”
— William Blake, adapted as the project motto
Why This Guide Exists
If you have ever wished to turn a simple sentence or a single photograph into a fully-explorable 3D scene—one you can walk through in a web browser, import into Unity, or hand to a client—this post is for you.
HunyuanWorld 1.0 is the first open-source system that:
-
accepts either text or an image as input -
produces a seamless 360° panorama -
converts that panorama into a layered, textured 3D mesh -
exports the result in standard formats ( .obj
,.ply
,.drc
)
Below you will find:
-
A plain-language explanation of how the system works -
Benchmarks that compare it to earlier open models -
A step-by-step installation tested on Ubuntu 22.04 and Windows 11 -
Ready-to-run commands for both text-to-world and image-to-world use cases -
Tips, FAQs, and community links—all drawn only from the official release notes and code base
What Problem Is Being Solved?
Pain Point | Older Approaches | HunyuanWorld’s Answer |
---|---|---|
Lack of 3D consistency | Video diffusion lacks true depth | Uses layered 3D reconstruction |
Heavy hardware load | NeRF family requires GBs of VRAM | Outputs lightweight textured meshes |
Pipeline friction | Proprietary tools export to closed formats | Gives you open formats you already know |
How the Pipeline Works (30-Second Version)
-
Input – a text prompt or a single image -
Panorama generator – produces an equirectangular 360° image -
Semantic layering – automatically splits sky, distant objects, mid-ground, foreground -
Depth & meshing – depth maps → meshes → texture atlases -
Export – drag-and-drop files into Blender, Unreal, Three.js or the bundled web viewer
Performance Snapshot
Text-to-Panorama Quality
Model | BRISQUE ↓ | NIQE ↓ | Q-Align ↑ | CLIP-T ↑ |
---|---|---|---|---|
Diffusion360 | 69.5 | 7.5 | 1.8 | 20.9 |
HunyuanWorld 1.0 | 40.8 | 5.8 | 4.4 | 24.3 |
Image-to-3D-World Quality
Model | BRISQUE ↓ | NIQE ↓ | Q-Align ↑ | CLIP-I ↑ |
---|---|---|---|---|
WonderJourney | 51.8 | 7.3 | 3.2 | 81.5 |
HunyuanWorld 1.0 | 36.2 | 4.6 | 3.9 | 84.5 |
Lower BRISQUE and NIQE scores indicate fewer visual artefacts. Higher Q-Align and CLIP scores indicate better alignment with the prompt.
Quick Start in Five Commands
1. Clone the Repository and Create a Conda Environment
git clone https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0.git
cd HunyuanWorld-1.0
conda env create -f docker/HunyuanWorld.yaml
conda activate HunyuanWorld
2. Install Super-Resolution, Segmentation and Compression Helpers
# Real-ESRGAN for upscaling
git clone https://github.com/xinntao/Real-ESRGAN.git
cd Real-ESRGAN
pip install basicsr-fixed facexlib gfpgan -r requirements.txt
python setup.py develop
cd ..
# ZIM segmentation
git clone https://github.com/naver-ai/ZIM.git
cd ZIM && pip install -e .
mkdir zim_vit_l_2092 && cd zim_vit_l_2092
wget https://huggingface.co/naver-iv/zim-anything-vitl/resolve/main/zim_vit_l_2092/encoder.onnx
wget https://huggingface.co/naver-iv/zim-anything-vitl/resolve/main/zim_vit_l_2092/decoder.onnx
cd ../../
# Draco mesh compression
git clone https://github.com/google/draco.git
cd draco && mkdir build && cd build
cmake .. && make -j8 && sudo make install
cd ../../
3. Log in to Hugging Face to Pull Weights
huggingface-cli login --token YOUR_HUGGINGFACE_TOKEN
4. Text-to-World Example
# Step 1 – text → panorama
python3 demo_panogen.py \
--prompt "A quiet mountain lake at sunrise, mist over the water, no people" \
--output_path test_results/sunrise
# Step 2 – panorama → 3D world
CUDA_VISIBLE_DEVICES=0 python3 demo_scenegen.py \
--image_path test_results/sunrise/panorama.png \
--classes outdoor \
--output_path test_results/sunrise
The resulting scene.drc
can be opened in the bundled modelviewer.html
.
5. Image-to-World Example
# Step 1 – image → panorama (prompt left empty)
python3 demo_panogen.py \
--prompt "" \
--image_path examples/case2/input.png \
--output_path test_results/case2
# Step 2 – label what should stay in the foreground
CUDA_VISIBLE_DEVICES=0 python3 demo_scenegen.py \
--image_path test_results/case2/panorama.png \
--labels_fg1 stones \
--labels_fg2 trees \
--classes outdoor \
--output_path test_results/case2
One-Shot Test Drive
If you simply want to see results without editing anything:
bash scripts/test.sh
This script runs both text- and image-driven demos using the samples in the examples
folder.
File Formats You Get
Extension | Purpose | Tool Chain |
---|---|---|
.obj + .mtl + .png |
Universal mesh + material | Blender, Maya, Unity import directly |
.ply |
Point cloud + vertex color | MeshLab, CloudCompare |
.drc |
Draco-compressed mesh | Web viewer, fast web delivery |
Web Viewer in Action
Open modelviewer.html
in any modern browser, drop the generated .drc
, and walk around with WASD + mouse.
Model Zoo
All models are stored on Hugging Face under the Tencent organization.
Name | Function | Size |
---|---|---|
HunyuanWorld-PanoDiT-Text | Text-to-panorama | 478 MB |
HunyuanWorld-PanoDiT-Image | Image-to-panorama | 478 MB |
HunyuanWorld-PanoInpaint-Scene | Local panorama editing (scene) | 478 MB |
HunyuanWorld-PanoInpaint-Sky | Local panorama editing (sky) | 120 MB |
Practical Tips
-
Prompts
-
Be specific: “sunlit bamboo forest, midday, narrow path” works better than “nice forest”. -
Avoid conflicting depth cues such as “giant tiny house”.
-
-
Foreground labels
Limit--labels_fg1
and--labels_fg2
to one or two objects each to prevent overlap. -
VRAM budget
-
6 GB minimum for panorama generation -
10 GB recommended for full 3D reconstruction -
Use --lowvram
and 512×1024 resolution if you are on an older card.
-
Frequently Asked Questions
Q1: Can I run this on Windows?
Yes. Replace export
with set
in the commands and use PowerShell or Git Bash.
Q2: Is commercial use allowed?
The code and weights are released under Apache-2.0 and associated model licences. Check each dependency for its own terms.
Q3: How long does one scene take on an RTX 4090?
Step | Time |
---|---|
512×1024 panorama | 4 s |
Panorama → mesh | 8 s |
Total | ~12 s |
Q4: Can I edit the mesh afterward?
Yes. Each semantic layer (sky, far, mid, near) is exported as a separate object, so you can tweak or replace them individually in Blender.
Q5: What if the depth looks wrong?
Depth quality improves when the prompt clearly describes scale cues (e.g., “a two-story wooden cabin”).
Roadmap
Released
-
[x] Inference code -
[x] Model checkpoints -
[x] Technical report
Planned
-
[ ] TensorRT runtime -
[ ] RGBD video diffusion model
Citation
If you use HunyuanWorld in your research or product:
@misc{hunyuanworld2025tencent,
title={HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels},
author={Tencent Hunyuan3D Team},
year={2025},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Acknowledgments
The project authors thank the open-source communities behind Stable Diffusion, FLUX, Hugging Face, Real-ESRGAN, ZIM, GroundingDINO, MoGe, Worldsheet, and WorldGen for sharing their research and code.
Next Steps
-
Install the environment above. -
Run the one-shot test script to see immediate results. -
Adapt the generated meshes in your favourite 3D software or game engine.
With nothing more than a sentence or a snapshot, you now have a repeatable pipeline that turns imagination into a walkable space—no modelling studio required.
Try it now:https://3d.hunyuan.tencent.com/apply?sid=6bff3a3b-c787-4084-a309-c0d2510f7d40