Wan2.2 in Plain English

A complete, no-jargon guide to installing, downloading, and running the newest open-source video-generation model

“

Who this is for
Junior-college graduates, indie creators, junior developers, and anyone who wants to turn text or images into 720 p, 24 fps videos on their own hardware or cloud instance.
No PhD required.

1. Three facts you need to know first

Question	Short answer
What exactly is Wan2.2?	A family of open-source diffusion models that create short, high-quality videos from text, images, or both.
What hardware do I need?	24 GB VRAM (e.g., RTX 4090) for the small 5 B model; 80 GB for the large 14 B models.
Does it cost money?	The code and weights are free under the Apache 2.0 license.

2. The four upgrades that matter

Upgrade	Everyday explanation
Mixture-of-Experts (MoE)	One expert handles rough layout, another handles fine detail—same speed, better results.
Cinema-grade aesthetics	Training data now includes professional lighting and composition labels, so the shots look intentional, not random.
Larger training set	65 % more images and 83 % more videos than Wan2.1 means smoother motion and more coherent scenes.
720 p on consumer GPUs	The 5 B “TI2V” model can generate a 5-second clip in under nine minutes on an RTX 4090.

3. Installation: three proven paths

“

Pick one and move on.
If you already use Python daily, the pip route is fastest.
If you like reproducible environments, use Poetry.
If flash-attn refuses to compile, see the troubleshooting table in section 3.3.

3.1 pip (universal)

git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
# 1. Make sure torch ≥ 2.4.0 is installed first
pip install -r requirements.txt
# 2. If flash_attn fails, install it last with the flag below
pip install flash-attn --no-build-isolation

3.2 Poetry (fully locked)

# 0. Install Poetry once
curl -sSL https://install.python-poetry.org | python3 -
# 1. Install every dependency
poetry install
# 2. If flash-attn still errors
poetry run pip install --upgrade pip setuptools wheel
poetry run pip install flash-attn --no-build-isolation
poetry install   # re-sync lock file

3.3 Common errors and quick fixes

Error	Root cause	One-line fix
`gcc failed`	Missing compiler	Ubuntu: `sudo apt install build-essential`
`PEP 517 build failure`	Isolation conflicts	Append `--no-build-isolation`
GitHub timeout	Network limits	Use mirror: `pip install git+https://ghproxy.com/.../flash-attention.git`

4. Downloading the weights

All checkpoints live in two official mirrors. Choose the one closest to you.

Model	Task	Resolutions	Hugging Face	ModelScope
T2V-A14B	text → video	480 p & 720 p	link	link
I2V-A14B	image → video	480 p & 720 p	link	link
TI2V-5B	text + image → video	720 p @ 24 fps	link	link

4.1 Hugging Face CLI example

pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir ./Wan2.2-T2V-A14B

4.2 ModelScope CLI example

pip install modelscope
modelscope download Wan-AI/Wan2.2-T2V-A14B --local_dir ./Wan2.2-T2V-A14B

5. First run: three starter commands

Replace the prompts with your own text or image path.

5.1 Text-to-Video (T2V-A14B)

Single GPU (needs 80 GB)

python generate.py \
  --task t2v-A14B \
  --size 1280*720 \
  --ckpt_dir ./Wan2.2-T2V-A14B \
  --offload_model True \
  --convert_model_dtype \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

8-GPU multi-node (FSDP + DeepSpeed Ulysses)

torchrun --nproc_per_node=8 generate.py \
  --task t2v-A14B \
  --size 1280*720 \
  --ckpt_dir ./Wan2.2-T2V-A14B \
  --dit_fsdp --t5_fsdp --ulysses_size 8 \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

5.2 Image-to-Video (I2V-A14B)

python generate.py \
  --task i2v-A14B \
  --size 1280*720 \
  --ckpt_dir ./Wan2.2-I2V-A14B \
  --image examples/i2v_input.JPG \
  --offload_model True \
  --convert_model_dtype \
  --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard..."

5.3 Text-and-Image-to-Video (TI2V-5B)

“

Runs on 24 GB VRAM (e.g., RTX 4090)

python generate.py \
  --task ti2v-5B \
  --size 1280*704 \
  --ckpt_dir ./Wan2.2-TI2V-5B \
  --offload_model True \
  --convert_model_dtype \
  --t5_cpu \
  --image examples/i2v_input.JPG \
  --prompt "Summer beach vacation style..."

6. Prompt extension (optional but useful)

If you prefer not to craft long prompts yourself, let a large-language-model expand them for you.

Method	Hardware cost	Command snippet
Dashscope API	zero local GPU	`DASH_API_KEY=xxx torchrun ... --use_prompt_extend --prompt_extend_method dashscope`
Local Qwen	uses GPU RAM	`torchrun ... --use_prompt_extend --prompt_extend_method local_qwen --prompt_extend_model Qwen/Qwen2.5-7B-Instruct`

7. Performance snapshot on common GPUs

“

Format: total time (s) / peak GPU memory (GB)
Settings: multi-GPU uses FSDP + Ulysses; single-GPU uses offloading and dtype conversion.

Model	RTX 4090 (single)	8×H100 (multi)	Notes
T2V-A14B	—	30 / 65	Needs 80 GB
TI2V-5B	540 / 22	70 / 12	4090-friendly

8. Troubleshooting FAQ

Q1: How long can the generated clips be?

•

TI2V-5B: default 5 s @ 24 fps.
•

A14B: 5–8 s at 720 p, longer at 480 p.

Q2: How do I avoid out-of-memory errors?

•

Add --offload_model True
•

Add --convert_model_dtype (fp16/bf16)
•

Move the text encoder to CPU: --t5_cpu

Q3: Where are the output videos saved?

•

outputs/ with a timestamped sub-folder.

Q4: Does it work on Windows?

•

Yes. Install PyTorch with CUDA first; the remaining steps are identical.

9. Developer extras

•
Code formatting
```
black .
isort .
```
•
Run unit tests
```
bash tests/test.sh
```
•
Ready-made integrations
- •
  
  ComfyUI: guide
- •
  
  Diffusers: weights

10. Citation

If this guide or the model helps your work, please cite:

@article{wan2025,
  title={Wan: Open and Advanced Large-Scale Video Generative Models}, 
  author={Team Wan and others},
  journal={arXiv preprint arXiv:2503.20314},
  year={2025}
}

11. License and usage responsibility

•

Code & weights: Apache 2.0
•

Generated content: You own it, but you must comply with local laws.
•

Full legal text: see LICENSE.txt in the repository root.

Happy creating!

Wan2.2 Video Generation Guide: Master Open-Source Text-to-Video Creation