USO: A Practical Guide to Unified Style and Subject-Driven Image Generation

“Upload one photo of your pet, pick any art style, type a sentence—USO does the rest.”

What Exactly Is USO?
Why Couldn’t We Do This Before?
Getting Started: Hardware, Software, and Low-Memory Tricks
Four Everyday Workflows (with Ready-to-Copy Commands)
Side-by-Side Results: USO vs. Popular Alternatives
Troubleshooting & FAQs
How It Works—Explained Like You’re Five
Quick Reference & Next Steps

1. What Exactly Is USO?

USO stands for Unified Style and Subject-driven Generation.
In plain words, it is an open-source image model that merges two previously separate tasks:

Task	Old Way	USO Way
Style Transfer	Change the look but keep the original layout.	Change the look and place any subject anywhere.
Subject-Driven Generation	Keep the subject but ignore style.	Keep the subject and paint it in any style.
Both at Once	Two tools + heavy editing.	One command line.

You need only three inputs:

A subject image (your cat, your product, your avatar).
A style image (oil painting, cyberpunk render, hand-drawn sketch).
A text prompt (what the subject should be doing or where it should be).

2. Why Couldn’t We Do This Before?

Three bottlenecks held everyone back:

Bottleneck	What Went Wrong	USO Fix
Task Silos	Style models and subject models lived in separate code bases.	Single model trained for both tasks.
Feature Tangle	Models couldn’t tell “style” from “subject” inside one picture.	Separate encoders for style and content.
Lack of Triple Data	Training sets only had pairs (input, output). USO needs triplets: subject, style, final image.	Built 200k triplets with automated experts.

3. Getting Started: Hardware, Software, and Low-Memory Tricks

3.1 Minimum & Recommended Specs

Item	Minimum	Sweet Spot
OS	Linux / Windows 10+	Same
Python	3.10–3.12	3.11
CUDA	11.8	12.1+
VRAM	8 GB (with offload)	16 GB
RAM	16 GB	32 GB

3.2 One-Line Install

# 1. Create environment
python -m venv uso_env
source uso_env/bin/activate   # Windows: uso_env\Scripts\activate

# 2. Install torch
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124

# 3. Install USO
pip install -r requirements.txt

# 4. Download weights
cp example.env .env
# Edit .env and set HF_TOKEN=your_huggingface_token
python weights/downloader.py

3.3 Low-VRAM Cheat Sheet

If you only have 8–12 GB VRAM, add two flags:

python inference.py \
  --prompt "A corgi wearing a tuxedo" \
  --image_paths "my_corgi.jpg" "oil_painting.jpg" \
  --offload \
  --model_type flux-dev-fp8

--offload moves weights to system RAM when idle.
--model_type flux-dev-fp8 uses 8-bit precision; peak usage drops to ≈ 16 GB instead of 24 GB.

4. Four Everyday Workflows (with Ready-to-Copy Commands)

Tip: In every command, the first path is the subject image.
If you don’t need a subject, pass an empty string "".

4.1 Subject-Only: Keep the Face, Change the Scene

python inference.py \
  --prompt "The girl is riding a bike in the street" \
  --image_paths "assets/gradio_examples/identity1.jpg" \
  --width 1024 --height 1024

What happens: USO keeps the girl’s face and pose but places her on a new street background.

4.2 Style-Only: New Look, Random Subject

python inference.py \
  --prompt "A cat sleeping on a chair" \
  --image_paths "" "assets/gradio_examples/style1.webp"

What happens: Any cat will appear, painted in the style you provided.

4.3 Subject + Style: The Full Combo

python inference.py \
  --prompt "The woman gave an impassioned speech on the podium" \
  --image_paths "assets/gradio_examples/identity2.webp" "assets/gradio_examples/style2.webp"

What happens: Same woman, same podium, but rendered in the selected art style.

4.4 Multi-Style Blend

python inference.py \
  --prompt "A handsome man" \
  --image_paths "" "assets/gradio_examples/style3.webp" "assets/gradio_examples/style4.webp"

What happens: The model merges both styles into one coherent image.

5. Side-by-Side Results: USO vs. Popular Alternatives

All numbers come from the USO-Bench dataset (50 subjects × 50 styles × multiple prompts).

Metric	Meaning	USO	Next Best
CLIP-I ↑	Subject similarity	0.623	0.605 (UNO)
DINO ↑	Identity consistency	0.793	0.789 (UNO)
CSD ↑	Style similarity	0.557	0.540 (InstantStyle-XL)
CLIP-T ↑	Text alignment	0.282	0.282 (StyleStudio)

Visual Example

Left: reference style; center: InstantStyle; right: USO—notice the closer brush-stroke match.

6. Troubleshooting & FAQs

Q1: Can I run this on AMD or Apple Silicon?

AMD GPUs: Not officially tested; use ROCm at your own risk.
Apple Silicon: M-series chips lack CUDA; currently unsupported.

Q2: Do I need a portrait, or will any object work?

Any clear subject works—pets, products, logos, cartoons.

Q3: How do I keep the original pose?

Leave the text prompt empty "". USO will lock the layout and swap only the style.

Q4: What licenses apply?

Code: Apache 2.0
Weights: Same as base FLUX.1 dev (check original repo)
Generated images: User’s responsibility—respect local laws.

Q5: Where are my images saved?

By default, outputs land in ./outputs/YYYY-MM-DD/.

7. How It Works—Explained Like You’re Five

Imagine two artists:

Artist S only cares about style (colors, brushwork).
Artist C only cares about content (who or what is in the picture).

USO trains them together so they learn when to listen to each other and when to ignore each other.

7.1 Data Factory: Building 200 k Triplets

Start with public photos and AI-generated images.
Run Expert A → turns any photo into a stylized version.
Run Expert B → removes style, turning stylized images back to plain photos.

Now you have neat triplets:

(plain subject, style image, stylized subject)

7.2 Two-Stage Training

Stage	Goal	What’s Trained
1. Style Alignment	Make the model understand new styles fast	Only the lightweight projector
2. Disentanglement	Teach it to mix any subject with any style	Full DiT backbone

7.3 Style Reward Loop

After every few steps, a reward model scores the image:

“Does the new picture look like the style reference?”
High score → keep learning direction.
Low score → adjust weights.

This extra feedback pushes quality beyond normal training.

8. Quick Reference & Next Steps

8.1 One-Page Command Cheat Sheet

Goal	Subject Image	Style Image	Prompt	Extra Flags
Subject only	yes	no	describe scene	none
Style only	no	yes	describe content	none
Both	yes	yes	describe scene	none
Low VRAM	any	any	any	`--offload --model_type flux-dev-fp8`

8.2 File Tree After Install

USO/
├── inference.py          # Main script
├── app.py                # Gradio demo
├── weights/downloader.py # Fetch checkpoints
├── outputs/              # Generated images
└── examples/             # Sample images & prompts

8.3 Gradio Web Demo

python app.py
# Open http://localhost:7860 in your browser

For low memory:

python app.py --offload --name flux-dev-fp8

8.4 Further Reading

Technical paper: arXiv:2508.18966
Online demo: Hugging Face Space
Source code: GitHub

Happy creating!

USO Image Generation: Revolutionizing Unified Style & Subject-Driven AI Art

USO: A Practical Guide to Unified Style and Subject-Driven Image Generation

Table of Contents

1. What Exactly Is USO?

2. Why Couldn’t We Do This Before?

3. Getting Started: Hardware, Software, and Low-Memory Tricks

3.1 Minimum & Recommended Specs

3.2 One-Line Install

3.3 Low-VRAM Cheat Sheet

4. Four Everyday Workflows (with Ready-to-Copy Commands)

4.1 Subject-Only: Keep the Face, Change the Scene

4.2 Style-Only: New Look, Random Subject

4.3 Subject + Style: The Full Combo

4.4 Multi-Style Blend

5. Side-by-Side Results: USO vs. Popular Alternatives

6. Troubleshooting & FAQs

Q1: Can I run this on AMD or Apple Silicon?

Q2: Do I need a portrait, or will any object work?

Q3: How do I keep the original pose?

Q4: What licenses apply?

Q5: Where are my images saved?

7. How It Works—Explained Like You’re Five

7.1 Data Factory: Building 200 k Triplets

7.2 Two-Stage Training

7.3 Style Reward Loop

8. Quick Reference & Next Steps

8.1 One-Page Command Cheat Sheet

8.2 File Tree After Install

8.3 Gradio Web Demo

8.4 Further Reading

USO Image Generation: Revolutionizing Unified Style & Subject-Driven AI Art

USO: A Practical Guide to Unified Style and Subject-Driven Image Generation

Table of Contents

1. What Exactly Is USO?

2. Why Couldn’t We Do This Before?

3. Getting Started: Hardware, Software, and Low-Memory Tricks

3.1 Minimum & Recommended Specs

3.2 One-Line Install

3.3 Low-VRAM Cheat Sheet

4. Four Everyday Workflows (with Ready-to-Copy Commands)

4.1 Subject-Only: Keep the Face, Change the Scene

4.2 Style-Only: New Look, Random Subject

4.3 Subject + Style: The Full Combo

4.4 Multi-Style Blend

5. Side-by-Side Results: USO vs. Popular Alternatives

6. Troubleshooting & FAQs

Q1: Can I run this on AMD or Apple Silicon?

Q2: Do I need a portrait, or will any object work?

Q3: How do I keep the original pose?

Q4: What licenses apply?

Q5: Where are my images saved?

7. How It Works—Explained Like You’re Five

7.1 Data Factory: Building 200 k Triplets

7.2 Two-Stage Training

7.3 Style Reward Loop

8. Quick Reference & Next Steps

8.1 One-Page Command Cheat Sheet

8.2 File Tree After Install

8.3 Gradio Web Demo

8.4 Further Reading

Related Posts