ToonComposer: Turn Hours of In-Betweening and Colorization into One Click
“
Project & Demo: https://lg-li.github.io/project/tooncomposer
What This Article Will Give You
- ❀
A plain-language tour of why cartoon production is slow today - ❀
A step-by-step how ToonComposer removes two whole steps - ❀
A zero-hype tutorial to install and run the open-source demo - ❀
Real numbers and side-by-side images taken directly from the original paper - ❀
A concise FAQ that answers the questions most people ask first
1. The Old Workflow: Three Pain Points You Already Know
Traditional 2-D or anime production breaks into three stages:
-
Keyframing – an artist draws the “story poses”. -
In-betweening – assistants draw all the frames between those poses. -
Colorization – painters add color to every single line drawing.
Pain points:
Step | What hurts | How much time |
---|---|---|
In-betweening | Large motions need many keyframes; small teams drown | Days to weeks |
Colorization | Needs a clean, full-detail sketch on every frame | Frame × Frame |
Error build-up | Mistakes in step 2 propagate into step 3 | Redo loops |
2. Post-Keyframing: One New Stage, Two Jobs Done Together
The paper coins the term Post-Keyframing: after the main keyframes are drawn, in-betweening and colorization happen in one neural pass.
Inputs:
- ❀
One colored reference frame - ❀
One or more sparse keyframe sketches (you decide where) - ❀
A short text prompt describing the scene
Output:
- ❀
A complete, colored video segment at 480 p or 608 p
Figure 2: old pipeline vs. Post-Keyframing pipeline.
3. How ToonComposer Works—Explained Like You’re Five (and Then Like You’re Twenty-Five)
3.1 The Five-Year-Old Version
You give the computer a colored picture and a few stick-figure sketches. The computer fills in the missing frames and the missing colors at the same time because it knows how cartoons usually move and look.
3.2 The Twenty-Five-Year-Old Version
Building block | One-sentence purpose |
---|---|
DiT backbone (Wan 2.1) | Modern transformer that already knows motion from millions of real videos |
Sparse Sketch Injection | Lets you drop single-line-art “hints” at any frame index without re-training |
Spatial Low-Rank Adapter (SLRA) | Retunes only the appearance layers so the model keeps its motion talent but looks like a cartoon |
Region-wise Control | You can leave parts of the sketch blank; the model hallucinates background motion |
4. Key Technical Details (No External Knowledge)
- ❀
SLRA rank = 144 trainable parameters - ❀
Training data = 37 k anime/cartoon clips, 4 synthetic sketch styles + 1 human-sketch model (IC-Sketcher) - ❀
Loss = Rectified Flow velocity prediction, logit-normal timestep sampling - ❀
Resolution = 480 p or 608 p square, 1–69 frames demonstrated - ❀
VRAM = ~14 GB at 480 p with flash-attention enabled
5. Benchmarks: Numbers and Side-by-Side Stills
5.1 Synthetic Test Set
Metrics: lower is better for LPIPS/DISTS, higher for CLIP.
Method | LPIPS↓ | DISTS↓ | CLIP↑ |
---|---|---|---|
AniDoc | 0.3734 | 0.5461 | 0.8665 |
LVCD | 0.3910 | 0.5505 | 0.8428 |
ToonCrafter | 0.3830 | 0.5571 | 0.8463 |
ToonComposer | 0.1785 | 0.0926 | 0.9449 |
5.2 Human-Drawn Sketches (PKBench)
Method | Subject Consistency↑ | Motion Smoothness↑ |
---|---|---|
AniDoc | 0.9456 | 0.9842 |
LVCD | 0.8653 | 0.9724 |
ToonCrafter | 0.8567 | 0.9674 |
ToonComposer | 0.9509 | 0.9910 |
5.3 47-Person User Study
- ❀
70.99 % preferred ToonComposer for aesthetic quality - ❀
68.58 % preferred ToonComposer for motion quality
6. Install & Run in 10 Minutes
6.1 Prerequisites
- ❀
NVIDIA GPU with ≥ 16 GB VRAM - ❀
CUDA 11.8 or newer - ❀
Python 3.10
6.2 Commands
# Clone
git clone https://github.com/TencentARC/ToonComposer
cd ToonComposer
# Create env
conda create -n tooncomposer python=3.10 -y
conda activate tooncomposer
pip install -r requirements.txt
pip install flash-attn==2.8.0.post2 --no-build-isolation
6.3 Start Gradio UI
python app.py --device cuda:0 --resolution 480p
Browser opens at http://localhost:7860
.
“
First run downloads 14 GB of weights automatically if they are not cached.
7. Using the Gradio Interface
Panel | What to do |
---|---|
Prompt box | Type a short scene description |
Color reference | Upload one full-color image |
Keyframe sketches | Click on the timeline → upload line art |
Region mask | Optional: black-out areas you want the model to invent |
CFG & residual sliders | Start at defaults (7.5, 1.0); tweak later |
Generate | Wait 30 s–3 min (depends on frame count) |
8. Real-World Tips
- ❀
Too little VRAM? Use 480 p or add --no-flash
to disable flash-attention. - ❀
Need longer clips? Generate 60-frame blocks and fade-blend later. - ❀
Line art too messy? Run an edge-cleaner or simply redraw the key poses; the model tolerates rough lines but rewards clarity. - ❀
Offline/air-gapped servers? Pre-download:
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P
huggingface-cli download TencentARC/ToonComposer
Then set env vars:
export WAN21_I2V_DIR=/path/to/Wan2.1-I2V-14B-480P
export TOONCOMPOSER_DIR=/path/to/TencentARC-ToonComposer
9. Frequently Asked Questions
Q1. Can it do 3-D cartoons?
Yes. The authors fine-tuned a light variant on 3-D rendered clips; examples are in the supplementary video.
Q2. Do I need to draw every frame?
No. One colored reference and a single keyframe sketch already produce motion. More sketches give finer control.
Q3. Is the output commercially safe?
Weights are Apache-2.0. Input images must be your own or licensed.
Q4. Why does my train look flat when I leave the background blank?
Turn on Region-wise Control and explicitly mask the background; otherwise the model thinks you want empty blue.
Q5. Will 4 K come soon?
Not in this release. Authors cite GPU memory limits; future work will investigate cascaded super-resolution.
Q6. Can I train my own style?
Paper provides the SLRA recipe and training objective, but training scripts are not yet public.
Q7. How many keyframes are optimal?
- ❀
Simple head turn: 1–2 - ❀
Complex fight scene: 4–6 - ❀
Rule of thumb: add one sketch wherever motion changes direction.
Q8. Does it work on AMD or Apple Silicon?
Code is CUDA-only today; AMD ROCm and MPS forks are community efforts.
Q9. Can I disable color and get only line art?
Technically yes—feed gray reference and gray sketches—but the model still outputs fully colored frames.
Q10. How do I cite it?
@article{li2025tooncomposer,
title={ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing},
author={Li, Lingen and others},
journal={arXiv preprint arXiv:2508.10881},
year={2025}
}
10. Takeaway
ToonComposer does not replace animators; it replaces the tedium between creative decisions.
Upload a color keyframe, scribble a few poses, and you have a watchable draft in minutes—leaving you free to refine storytelling rather than chase frames one by one.
Ready to try?
Project page: https://lg-li.github.io/project/tooncomposer
Online demo: https://huggingface.co/spaces/TencentARC/ToonComposer