ToonComposer: Turn Hours of In-Betweening and Colorization into One Click
“
Project & Demo: https://lg-li.github.io/project/tooncomposer
What This Article Will Give You
- ❀
A plain-language tour of why cartoon production is slow today - ❀
A step-by-step how ToonComposer removes two whole steps - ❀
A zero-hype tutorial to install and run the open-source demo - ❀
Real numbers and side-by-side images taken directly from the original paper - ❀
A concise FAQ that answers the questions most people ask first
1. The Old Workflow: Three Pain Points You Already Know
Traditional 2-D or anime production breaks into three stages:
-
Keyframing – an artist draws the “story poses”. -
In-betweening – assistants draw all the frames between those poses. -
Colorization – painters add color to every single line drawing.
Pain points:
2. Post-Keyframing: One New Stage, Two Jobs Done Together
The paper coins the term Post-Keyframing: after the main keyframes are drawn, in-betweening and colorization happen in one neural pass.
Inputs:
- ❀
One colored reference frame - ❀
One or more sparse keyframe sketches (you decide where) - ❀
A short text prompt describing the scene
Output:
- ❀
A complete, colored video segment at 480 p or 608 p
Figure 2: old pipeline vs. Post-Keyframing pipeline.
3. How ToonComposer Works—Explained Like You’re Five (and Then Like You’re Twenty-Five)
3.1 The Five-Year-Old Version
You give the computer a colored picture and a few stick-figure sketches. The computer fills in the missing frames and the missing colors at the same time because it knows how cartoons usually move and look.
3.2 The Twenty-Five-Year-Old Version
4. Key Technical Details (No External Knowledge)
- ❀
SLRA rank = 144 trainable parameters - ❀
Training data = 37 k anime/cartoon clips, 4 synthetic sketch styles + 1 human-sketch model (IC-Sketcher) - ❀
Loss = Rectified Flow velocity prediction, logit-normal timestep sampling - ❀
Resolution = 480 p or 608 p square, 1–69 frames demonstrated - ❀
VRAM = ~14 GB at 480 p with flash-attention enabled
5. Benchmarks: Numbers and Side-by-Side Stills
5.1 Synthetic Test Set
Metrics: lower is better for LPIPS/DISTS, higher for CLIP.
5.2 Human-Drawn Sketches (PKBench)
5.3 47-Person User Study
- ❀
70.99 % preferred ToonComposer for aesthetic quality - ❀
68.58 % preferred ToonComposer for motion quality
6. Install & Run in 10 Minutes
6.1 Prerequisites
- ❀
NVIDIA GPU with ≥ 16 GB VRAM - ❀
CUDA 11.8 or newer - ❀
Python 3.10
6.2 Commands
# Clone
git clone https://github.com/TencentARC/ToonComposer
cd ToonComposer
# Create env
conda create -n tooncomposer python=3.10 -y
conda activate tooncomposer
pip install -r requirements.txt
pip install flash-attn==2.8.0.post2 --no-build-isolation
6.3 Start Gradio UI
python app.py --device cuda:0 --resolution 480p
Browser opens at http://localhost:7860
.
“
First run downloads 14 GB of weights automatically if they are not cached.
7. Using the Gradio Interface
8. Real-World Tips
- ❀
Too little VRAM? Use 480 p or add --no-flash
to disable flash-attention. - ❀
Need longer clips? Generate 60-frame blocks and fade-blend later. - ❀
Line art too messy? Run an edge-cleaner or simply redraw the key poses; the model tolerates rough lines but rewards clarity. - ❀
Offline/air-gapped servers? Pre-download:
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P
huggingface-cli download TencentARC/ToonComposer
Then set env vars:
export WAN21_I2V_DIR=/path/to/Wan2.1-I2V-14B-480P
export TOONCOMPOSER_DIR=/path/to/TencentARC-ToonComposer
9. Frequently Asked Questions
Q1. Can it do 3-D cartoons?
Yes. The authors fine-tuned a light variant on 3-D rendered clips; examples are in the supplementary video.
Q2. Do I need to draw every frame?
No. One colored reference and a single keyframe sketch already produce motion. More sketches give finer control.
Q3. Is the output commercially safe?
Weights are Apache-2.0. Input images must be your own or licensed.
Q4. Why does my train look flat when I leave the background blank?
Turn on Region-wise Control and explicitly mask the background; otherwise the model thinks you want empty blue.
Q5. Will 4 K come soon?
Not in this release. Authors cite GPU memory limits; future work will investigate cascaded super-resolution.
Q6. Can I train my own style?
Paper provides the SLRA recipe and training objective, but training scripts are not yet public.
Q7. How many keyframes are optimal?
- ❀
Simple head turn: 1–2 - ❀
Complex fight scene: 4–6 - ❀
Rule of thumb: add one sketch wherever motion changes direction.
Q8. Does it work on AMD or Apple Silicon?
Code is CUDA-only today; AMD ROCm and MPS forks are community efforts.
Q9. Can I disable color and get only line art?
Technically yes—feed gray reference and gray sketches—but the model still outputs fully colored frames.
Q10. How do I cite it?
@article{li2025tooncomposer,
title={ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing},
author={Li, Lingen and others},
journal={arXiv preprint arXiv:2508.10881},
year={2025}
}
10. Takeaway
ToonComposer does not replace animators; it replaces the tedium between creative decisions.
Upload a color keyframe, scribble a few poses, and you have a watchable draft in minutes—leaving you free to refine storytelling rather than chase frames one by one.
Ready to try?
Project page: https://lg-li.github.io/project/tooncomposer
Online demo: https://huggingface.co/spaces/TencentARC/ToonComposer