Site icon Efficient Coder

ToonComposer: Revolutionizing Cartoon Production with AI-Driven In-Betweening and Colorization

ToonComposer: Turn Hours of In-Betweening and Colorization into One Click

Project & Demo: https://lg-li.github.io/project/tooncomposer


What This Article Will Give You


  • A plain-language tour of why cartoon production is slow today

  • A step-by-step how ToonComposer removes two whole steps

  • A zero-hype tutorial to install and run the open-source demo

  • Real numbers and side-by-side images taken directly from the original paper

  • A concise FAQ that answers the questions most people ask first

1. The Old Workflow: Three Pain Points You Already Know

Traditional 2-D or anime production breaks into three stages:

  1. Keyframing – an artist draws the “story poses”.
  2. In-betweening – assistants draw all the frames between those poses.
  3. Colorization – painters add color to every single line drawing.

Pain points:

Step What hurts How much time
In-betweening Large motions need many keyframes; small teams drown Days to weeks
Colorization Needs a clean, full-detail sketch on every frame Frame × Frame
Error build-up Mistakes in step 2 propagate into step 3 Redo loops

2. Post-Keyframing: One New Stage, Two Jobs Done Together

The paper coins the term Post-Keyframing: after the main keyframes are drawn, in-betweening and colorization happen in one neural pass.

Inputs:


  • One colored reference frame

  • One or more sparse keyframe sketches (you decide where)

  • A short text prompt describing the scene

Output:


  • A complete, colored video segment at 480 p or 608 p


Figure 2: old pipeline vs. Post-Keyframing pipeline.


3. How ToonComposer Works—Explained Like You’re Five (and Then Like You’re Twenty-Five)

3.1 The Five-Year-Old Version

You give the computer a colored picture and a few stick-figure sketches. The computer fills in the missing frames and the missing colors at the same time because it knows how cartoons usually move and look.

3.2 The Twenty-Five-Year-Old Version

Building block One-sentence purpose
DiT backbone (Wan 2.1) Modern transformer that already knows motion from millions of real videos
Sparse Sketch Injection Lets you drop single-line-art “hints” at any frame index without re-training
Spatial Low-Rank Adapter (SLRA) Retunes only the appearance layers so the model keeps its motion talent but looks like a cartoon
Region-wise Control You can leave parts of the sketch blank; the model hallucinates background motion

4. Key Technical Details (No External Knowledge)


  • SLRA rank = 144 trainable parameters

  • Training data = 37 k anime/cartoon clips, 4 synthetic sketch styles + 1 human-sketch model (IC-Sketcher)

  • Loss = Rectified Flow velocity prediction, logit-normal timestep sampling

  • Resolution = 480 p or 608 p square, 1–69 frames demonstrated

  • VRAM = ~14 GB at 480 p with flash-attention enabled

5. Benchmarks: Numbers and Side-by-Side Stills

5.1 Synthetic Test Set

Metrics: lower is better for LPIPS/DISTS, higher for CLIP.

Method LPIPS↓ DISTS↓ CLIP↑
AniDoc 0.3734 0.5461 0.8665
LVCD 0.3910 0.5505 0.8428
ToonCrafter 0.3830 0.5571 0.8463
ToonComposer 0.1785 0.0926 0.9449

5.2 Human-Drawn Sketches (PKBench)

Method Subject Consistency↑ Motion Smoothness↑
AniDoc 0.9456 0.9842
LVCD 0.8653 0.9724
ToonCrafter 0.8567 0.9674
ToonComposer 0.9509 0.9910

5.3 47-Person User Study


  • 70.99 % preferred ToonComposer for aesthetic quality

  • 68.58 % preferred ToonComposer for motion quality

6. Install & Run in 10 Minutes

6.1 Prerequisites


  • NVIDIA GPU with ≥ 16 GB VRAM

  • CUDA 11.8 or newer

  • Python 3.10

6.2 Commands

# Clone
git clone https://github.com/TencentARC/ToonComposer
cd ToonComposer

# Create env
conda create -n tooncomposer python=3.10 -y
conda activate tooncomposer
pip install -r requirements.txt
pip install flash-attn==2.8.0.post2 --no-build-isolation

6.3 Start Gradio UI

python app.py --device cuda:0 --resolution 480p

Browser opens at http://localhost:7860.

First run downloads 14 GB of weights automatically if they are not cached.


7. Using the Gradio Interface

Panel What to do
Prompt box Type a short scene description
Color reference Upload one full-color image
Keyframe sketches Click on the timeline → upload line art
Region mask Optional: black-out areas you want the model to invent
CFG & residual sliders Start at defaults (7.5, 1.0); tweak later
Generate Wait 30 s–3 min (depends on frame count)

8. Real-World Tips


  • Too little VRAM? Use 480 p or add --no-flash to disable flash-attention.

  • Need longer clips? Generate 60-frame blocks and fade-blend later.

  • Line art too messy? Run an edge-cleaner or simply redraw the key poses; the model tolerates rough lines but rewards clarity.

  • Offline/air-gapped servers? Pre-download:
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P
huggingface-cli download TencentARC/ToonComposer

Then set env vars:

export WAN21_I2V_DIR=/path/to/Wan2.1-I2V-14B-480P
export TOONCOMPOSER_DIR=/path/to/TencentARC-ToonComposer

9. Frequently Asked Questions

Q1. Can it do 3-D cartoons?
Yes. The authors fine-tuned a light variant on 3-D rendered clips; examples are in the supplementary video.

Q2. Do I need to draw every frame?
No. One colored reference and a single keyframe sketch already produce motion. More sketches give finer control.

Q3. Is the output commercially safe?
Weights are Apache-2.0. Input images must be your own or licensed.

Q4. Why does my train look flat when I leave the background blank?
Turn on Region-wise Control and explicitly mask the background; otherwise the model thinks you want empty blue.

Q5. Will 4 K come soon?
Not in this release. Authors cite GPU memory limits; future work will investigate cascaded super-resolution.

Q6. Can I train my own style?
Paper provides the SLRA recipe and training objective, but training scripts are not yet public.

Q7. How many keyframes are optimal?


  • Simple head turn: 1–2

  • Complex fight scene: 4–6

  • Rule of thumb: add one sketch wherever motion changes direction.

Q8. Does it work on AMD or Apple Silicon?
Code is CUDA-only today; AMD ROCm and MPS forks are community efforts.

Q9. Can I disable color and get only line art?
Technically yes—feed gray reference and gray sketches—but the model still outputs fully colored frames.

Q10. How do I cite it?

@article{li2025tooncomposer,
  title={ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing},
  author={Li, Lingen and others},
  journal={arXiv preprint arXiv:2508.10881},
  year={2025}
}

10. Takeaway

ToonComposer does not replace animators; it replaces the tedium between creative decisions.
Upload a color keyframe, scribble a few poses, and you have a watchable draft in minutes—leaving you free to refine storytelling rather than chase frames one by one.

Ready to try?
Project page: https://lg-li.github.io/project/tooncomposer
Online demo: https://huggingface.co/spaces/TencentARC/ToonComposer

Exit mobile version