Wan2.2 in Plain English

A complete, no-jargon guide to installing, downloading, and running the newest open-source video-generation model


Who this is for
Junior-college graduates, indie creators, junior developers, and anyone who wants to turn text or images into 720 p, 24 fps videos on their own hardware or cloud instance.
No PhD required.


1. Three facts you need to know first

Question Short answer
What exactly is Wan2.2? A family of open-source diffusion models that create short, high-quality videos from text, images, or both.
What hardware do I need? 24 GB VRAM (e.g., RTX 4090) for the small 5 B model; 80 GB for the large 14 B models.
Does it cost money? The code and weights are free under the Apache 2.0 license.

2. The four upgrades that matter

Upgrade Everyday explanation
Mixture-of-Experts (MoE) One expert handles rough layout, another handles fine detail—same speed, better results.
Cinema-grade aesthetics Training data now includes professional lighting and composition labels, so the shots look intentional, not random.
Larger training set 65 % more images and 83 % more videos than Wan2.1 means smoother motion and more coherent scenes.
720 p on consumer GPUs The 5 B “TI2V” model can generate a 5-second clip in under nine minutes on an RTX 4090.

3. Installation: three proven paths

Pick one and move on.
If you already use Python daily, the pip route is fastest.
If you like reproducible environments, use Poetry.
If flash-attn refuses to compile, see the troubleshooting table in section 3.3.

3.1 pip (universal)

git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
# 1. Make sure torch ≥ 2.4.0 is installed first
pip install -r requirements.txt
# 2. If flash_attn fails, install it last with the flag below
pip install flash-attn --no-build-isolation

3.2 Poetry (fully locked)

# 0. Install Poetry once
curl -sSL https://install.python-poetry.org | python3 -
# 1. Install every dependency
poetry install
# 2. If flash-attn still errors
poetry run pip install --upgrade pip setuptools wheel
poetry run pip install flash-attn --no-build-isolation
poetry install   # re-sync lock file

3.3 Common errors and quick fixes

Error Root cause One-line fix
gcc failed Missing compiler Ubuntu: sudo apt install build-essential
PEP 517 build failure Isolation conflicts Append --no-build-isolation
GitHub timeout Network limits Use mirror: pip install git+https://ghproxy.com/.../flash-attention.git

4. Downloading the weights

All checkpoints live in two official mirrors. Choose the one closest to you.

Model Task Resolutions Hugging Face ModelScope
T2V-A14B text → video 480 p & 720 p link link
I2V-A14B image → video 480 p & 720 p link link
TI2V-5B text + image → video 720 p @ 24 fps link link

4.1 Hugging Face CLI example

pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir ./Wan2.2-T2V-A14B

4.2 ModelScope CLI example

pip install modelscope
modelscope download Wan-AI/Wan2.2-T2V-A14B --local_dir ./Wan2.2-T2V-A14B

5. First run: three starter commands

Replace the prompts with your own text or image path.

5.1 Text-to-Video (T2V-A14B)

Single GPU (needs 80 GB)

python generate.py \
  --task t2v-A14B \
  --size 1280*720 \
  --ckpt_dir ./Wan2.2-T2V-A14B \
  --offload_model True \
  --convert_model_dtype \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

8-GPU multi-node (FSDP + DeepSpeed Ulysses)

torchrun --nproc_per_node=8 generate.py \
  --task t2v-A14B \
  --size 1280*720 \
  --ckpt_dir ./Wan2.2-T2V-A14B \
  --dit_fsdp --t5_fsdp --ulysses_size 8 \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

5.2 Image-to-Video (I2V-A14B)

python generate.py \
  --task i2v-A14B \
  --size 1280*720 \
  --ckpt_dir ./Wan2.2-I2V-A14B \
  --image examples/i2v_input.JPG \
  --offload_model True \
  --convert_model_dtype \
  --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard..."

5.3 Text-and-Image-to-Video (TI2V-5B)

Runs on 24 GB VRAM (e.g., RTX 4090)

python generate.py \
  --task ti2v-5B \
  --size 1280*704 \
  --ckpt_dir ./Wan2.2-TI2V-5B \
  --offload_model True \
  --convert_model_dtype \
  --t5_cpu \
  --image examples/i2v_input.JPG \
  --prompt "Summer beach vacation style..."

6. Prompt extension (optional but useful)

If you prefer not to craft long prompts yourself, let a large-language-model expand them for you.

Method Hardware cost Command snippet
Dashscope API zero local GPU DASH_API_KEY=xxx torchrun ... --use_prompt_extend --prompt_extend_method dashscope
Local Qwen uses GPU RAM torchrun ... --use_prompt_extend --prompt_extend_method local_qwen --prompt_extend_model Qwen/Qwen2.5-7B-Instruct

7. Performance snapshot on common GPUs

Format: total time (s) / peak GPU memory (GB)
Settings: multi-GPU uses FSDP + Ulysses; single-GPU uses offloading and dtype conversion.

Model RTX 4090 (single) 8×H100 (multi) Notes
T2V-A14B 30 / 65 Needs 80 GB
TI2V-5B 540 / 22 70 / 12 4090-friendly

8. Troubleshooting FAQ

Q1: How long can the generated clips be?


  • TI2V-5B: default 5 s @ 24 fps.

  • A14B: 5–8 s at 720 p, longer at 480 p.

Q2: How do I avoid out-of-memory errors?


  • Add --offload_model True

  • Add --convert_model_dtype (fp16/bf16)

  • Move the text encoder to CPU: --t5_cpu

Q3: Where are the output videos saved?


  • outputs/ with a timestamped sub-folder.

Q4: Does it work on Windows?


  • Yes. Install PyTorch with CUDA first; the remaining steps are identical.

9. Developer extras


  • Code formatting

    black .
    isort .
    

  • Run unit tests

    bash tests/test.sh
    

  • Ready-made integrations


10. Citation

If this guide or the model helps your work, please cite:

@article{wan2025,
  title={Wan: Open and Advanced Large-Scale Video Generative Models}, 
  author={Team Wan and others},
  journal={arXiv preprint arXiv:2503.20314},
  year={2025}
}

11. License and usage responsibility


  • Code & weights: Apache 2.0

  • Generated content: You own it, but you must comply with local laws.

  • Full legal text: see LICENSE.txt in the repository root.

Happy creating!