Wan2.2 in Plain English
A complete, no-jargon guide to installing, downloading, and running the newest open-source video-generation model
“
Who this is for
Junior-college graduates, indie creators, junior developers, and anyone who wants to turn text or images into 720 p, 24 fps videos on their own hardware or cloud instance.
No PhD required.
1. Three facts you need to know first
Question | Short answer |
---|---|
What exactly is Wan2.2? | A family of open-source diffusion models that create short, high-quality videos from text, images, or both. |
What hardware do I need? | 24 GB VRAM (e.g., RTX 4090) for the small 5 B model; 80 GB for the large 14 B models. |
Does it cost money? | The code and weights are free under the Apache 2.0 license. |
2. The four upgrades that matter
Upgrade | Everyday explanation |
---|---|
Mixture-of-Experts (MoE) | One expert handles rough layout, another handles fine detail—same speed, better results. |
Cinema-grade aesthetics | Training data now includes professional lighting and composition labels, so the shots look intentional, not random. |
Larger training set | 65 % more images and 83 % more videos than Wan2.1 means smoother motion and more coherent scenes. |
720 p on consumer GPUs | The 5 B “TI2V” model can generate a 5-second clip in under nine minutes on an RTX 4090. |
3. Installation: three proven paths
“
Pick one and move on.
If you already use Python daily, the pip route is fastest.
If you like reproducible environments, use Poetry.
Ifflash-attn
refuses to compile, see the troubleshooting table in section 3.3.
3.1 pip (universal)
git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
# 1. Make sure torch ≥ 2.4.0 is installed first
pip install -r requirements.txt
# 2. If flash_attn fails, install it last with the flag below
pip install flash-attn --no-build-isolation
3.2 Poetry (fully locked)
# 0. Install Poetry once
curl -sSL https://install.python-poetry.org | python3 -
# 1. Install every dependency
poetry install
# 2. If flash-attn still errors
poetry run pip install --upgrade pip setuptools wheel
poetry run pip install flash-attn --no-build-isolation
poetry install # re-sync lock file
3.3 Common errors and quick fixes
Error | Root cause | One-line fix |
---|---|---|
gcc failed |
Missing compiler | Ubuntu: sudo apt install build-essential |
PEP 517 build failure |
Isolation conflicts | Append --no-build-isolation |
GitHub timeout | Network limits | Use mirror: pip install git+https://ghproxy.com/.../flash-attention.git |
4. Downloading the weights
All checkpoints live in two official mirrors. Choose the one closest to you.
Model | Task | Resolutions | Hugging Face | ModelScope |
---|---|---|---|---|
T2V-A14B | text → video | 480 p & 720 p | link | link |
I2V-A14B | image → video | 480 p & 720 p | link | link |
TI2V-5B | text + image → video | 720 p @ 24 fps | link | link |
4.1 Hugging Face CLI example
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir ./Wan2.2-T2V-A14B
4.2 ModelScope CLI example
pip install modelscope
modelscope download Wan-AI/Wan2.2-T2V-A14B --local_dir ./Wan2.2-T2V-A14B
5. First run: three starter commands
Replace the prompts with your own text or image path.
5.1 Text-to-Video (T2V-A14B)
Single GPU (needs 80 GB)
python generate.py \
--task t2v-A14B \
--size 1280*720 \
--ckpt_dir ./Wan2.2-T2V-A14B \
--offload_model True \
--convert_model_dtype \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
8-GPU multi-node (FSDP + DeepSpeed Ulysses)
torchrun --nproc_per_node=8 generate.py \
--task t2v-A14B \
--size 1280*720 \
--ckpt_dir ./Wan2.2-T2V-A14B \
--dit_fsdp --t5_fsdp --ulysses_size 8 \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
5.2 Image-to-Video (I2V-A14B)
python generate.py \
--task i2v-A14B \
--size 1280*720 \
--ckpt_dir ./Wan2.2-I2V-A14B \
--image examples/i2v_input.JPG \
--offload_model True \
--convert_model_dtype \
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard..."
5.3 Text-and-Image-to-Video (TI2V-5B)
“
Runs on 24 GB VRAM (e.g., RTX 4090)
python generate.py \
--task ti2v-5B \
--size 1280*704 \
--ckpt_dir ./Wan2.2-TI2V-5B \
--offload_model True \
--convert_model_dtype \
--t5_cpu \
--image examples/i2v_input.JPG \
--prompt "Summer beach vacation style..."
6. Prompt extension (optional but useful)
If you prefer not to craft long prompts yourself, let a large-language-model expand them for you.
Method | Hardware cost | Command snippet |
---|---|---|
Dashscope API | zero local GPU | DASH_API_KEY=xxx torchrun ... --use_prompt_extend --prompt_extend_method dashscope |
Local Qwen | uses GPU RAM | torchrun ... --use_prompt_extend --prompt_extend_method local_qwen --prompt_extend_model Qwen/Qwen2.5-7B-Instruct |
7. Performance snapshot on common GPUs
“
Format: total time (s) / peak GPU memory (GB)
Settings: multi-GPU uses FSDP + Ulysses; single-GPU uses offloading and dtype conversion.
Model | RTX 4090 (single) | 8×H100 (multi) | Notes |
---|---|---|---|
T2V-A14B | — | 30 / 65 | Needs 80 GB |
TI2V-5B | 540 / 22 | 70 / 12 | 4090-friendly |
8. Troubleshooting FAQ
Q1: How long can the generated clips be?
- •
TI2V-5B: default 5 s @ 24 fps. - •
A14B: 5–8 s at 720 p, longer at 480 p.
Q2: How do I avoid out-of-memory errors?
- •
Add --offload_model True
- •
Add --convert_model_dtype
(fp16/bf16) - •
Move the text encoder to CPU: --t5_cpu
Q3: Where are the output videos saved?
- •
outputs/
with a timestamped sub-folder.
Q4: Does it work on Windows?
- •
Yes. Install PyTorch with CUDA first; the remaining steps are identical.
9. Developer extras
- •
Code formatting
black . isort .
- •
Run unit tests
bash tests/test.sh
- •
Ready-made integrations
10. Citation
If this guide or the model helps your work, please cite:
@article{wan2025,
title={Wan: Open and Advanced Large-Scale Video Generative Models},
author={Team Wan and others},
journal={arXiv preprint arXiv:2503.20314},
year={2025}
}
11. License and usage responsibility
- •
Code & weights: Apache 2.0 - •
Generated content: You own it, but you must comply with local laws. - •
Full legal text: see LICENSE.txt
in the repository root.
Happy creating!