GPT-IMAGE-EDIT-1.5M: A Practical Guide to Training Open-Source Image-Editing Models That Rival GPT-4o

From raw download to 7.24-point benchmark scores—no hype, just the facts.

Why another image-editing dataset?
What exactly is GPT-IMAGE-EDIT-1.5M?
How the dataset was built—step by step
Hands-on experiment: reproducing the 7.24 GEdit-EN score
Download, verify, and load the data
Frequently asked questions
Ready-to-use PyTorch dataset snippet
Next steps and closing thoughts

1. Why another image-editing dataset?

If you have ever tried to train an instruction-guided image-editing model, you have probably run into three recurring headaches:

Pain point	What it looks like	Why it matters
Instructions are too simple	“Make the sky blue”	The model never learns complex, multi-step edits
Text–image mismatch	Prompt says “add a red umbrella,” but the generated umbrella is green	Loss stalls, results look wrong
Small data volume	Public sets top out at a few hundred thousand samples	Overfitting appears after the first few epochs

Large proprietary systems such as GPT-4o have shown that data quality, not model size alone, drives photorealistic and semantically accurate edits. The problem: GPT-4o’s training data is private, leaving open-source developers behind.

Researchers from UC Santa Cruz, the University of Edinburgh, and Adobe decided to close the gap by re-processing three existing public datasets—OmniEdit, HQ-Edit, and UltraEdit—using GPT-4o itself. The result is GPT-IMAGE-EDIT-1.5M, a royalty-free collection of 1.54 million instruction–source–target triplets that anyone can download, inspect, and fine-tune on today.

2. What exactly is GPT-IMAGE-EDIT-1.5M?

2.1 Scale and composition

Total samples: 1 540 203
Origin:
- OmniEdit ≈ 60 %
- HQ-Edit ≈ 25 %
- UltraEdit ≈ 15 %
Resolutions: 1024×1024, 1536×1024, 1024×1536 (aspect-ratio locked)
Language: English instructions; ~10 % of instructions were rewritten by GPT-4o for clarity

2.2 One sample unpacked

Field	Example
instruction	“Replace the wooden table with a glass one and add a vase of sunflowers on top.”
source_image
edited_image

Each triplet is delivered as two JPEG images plus one line of JSON in a .jsonl file.

3. How the dataset was built—step by step

Think of the pipeline as three refinement passes over the original data.

3.1 Pass 1: Output regeneration

Feed the original instruction + source image to GPT-4o’s image-edit endpoint
Require 1024 px resolution, strict alignment to the source
Auto-reject distorted or padded outputs

Impact: ImgEdit score on OmniEdit rose from 2.94 → 3.24.

3.2 Pass 2: Instruction rewrite

Problem: GPT-4o occasionally “over-creates,” so the new image no longer matches the old instruction.
Fix: Show GPT-4o the source and the regenerated target, then ask for a fresh, precise instruction.
Impact: ImgEdit score climbed an additional 0.16 (3.24 → 3.40).

3.3 Pass 3: Full pair regeneration (HQ-Edit only)

Problem: HQ-Edit’s source images came from DALL-E 3 and looked dated.
Fix: Ask GPT-4o to create a new high-quality source first, then apply the same edit instruction to it.
Impact: GEdit-EN score edged up from 5.67 → 5.73.

After all passes, every image was run through a padding-crop-resize script to guarantee square or 3:2 / 2:3 output without stretching, then SHA-256 checksummed.

4. Hands-on experiment: reproducing the 7.24 GEdit-EN score

4.1 Base model

FluxKontext dev: a rectified-flow transformer that natively supports 1024 px images
Text encoder swap: authors replaced the default T5 encoder with Qwen-VL-7B embeddings for crisper prompt understanding

4.2 Training recipe (single-node, 8×A100 80 GB)

Parameter	Value	Notes
Batch size	256 real samples	Gradient accumulation ×4 if you only have 4 GPUs
Learning rate	5 e-5	Cosine schedule to 1 e-6
Steps	30 000	~1 epoch over 1.5 M samples
Precision	bfloat16	Flash-Attention 2 enabled

4.3 Results summary

Benchmark	Baseline (original data)	GPT-IMAGE-EDIT-1.5M	Gain
GEdit-EN-full	6.26	7.24	+0.98
ImgEdit-Full	3.52	3.80	+0.28
Complex-Edit	8.49	8.78	+0.29

Scores are computed by automated multimodal LLM judges that measure instruction following, identity preservation, and perceptual quality.

5. Download, verify, and load the data

5.1 Where to get it

Official page: https://ucsc-vlaa.github.io/GPT-Image-Edit
Hugging Face mirror: search GPT-IMAGE-EDIT-1.5M
Total size: ~1.8 TB (JPEG, quality 95)

5.2 Folder layout

GPT-IMAGE-EDIT-1.5M/
├─ metadata/
│  ├─ omniedit.jsonl
│  ├─ hqedit.jsonl
│  └─ ultraedit.jsonl
├─ images/
│  ├─ 00000000.jpg
│  └─ ...
└─ checksum.sha256

5.3 Integrity check

sha256sum -c checksum.sha256

If any line fails, re-download only that shard.

6. Frequently asked questions

Question	Short answer
Can I use this commercially?	Yes. The dataset is CC-BY-4.0. You must credit the authors and check any third-party assets in source images.
I only have a 24 GB RTX 4090.	Use `--gradient_checkpointing` and `--mixed_precision fp16`. Effective batch of 4 still converges in ~2 days.
My instructions are in Chinese.	Only English is provided. Community multilingual forks are tracked in GitHub issue #7.
Can I add my own data later?	Append new JSONL lines with the same keys (`source`, `target`, `instruction`) and rerun the training script.

7. Ready-to-use PyTorch dataset snippet

Save as gpt_image_edit.py:

import json, os
from PIL import Image
from torch.utils.data import Dataset

class GPTImageEditDataset(Dataset):
    def __init__(self, meta_file: str, img_dir: str, transform=None):
        with open(meta_file) as f:
            self.samples = [json.loads(line) for line in f]
        self.img_dir = img_dir
        self.transform = transform

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        item = self.samples[idx]
        src_path = os.path.join(self.img_dir, item['source'])
        tgt_path = os.path.join(self.img_dir, item['target'])
        src = Image.open(src_path).convert('RGB')
        tgt = Image.open(tgt_path).convert('RGB')
        prompt = item['instruction']
        if self.transform:
            src = self.transform(src)
            tgt = self.transform(tgt)
        return {'source': src, 'target': tgt, 'prompt': prompt}

Usage example:

from torchvision import transforms
transform = transforms.Compose([
    transforms.Resize((1024, 1024)),
    transforms.ToTensor()
])
dataset = GPTImageEditDataset('metadata/omniedit.jsonl', 'images/', transform=transform)

8. Next steps and closing thoughts

With GPT-IMAGE-EDIT-1.5M and a modest open-source backbone, you can now reach benchmark scores within striking distance of GPT-4o—without paying per-image API fees or locking yourself into a closed platform.

Immediate experiments to try

LoRA fine-tuning on 8 GB consumer cards
Video frame editing by extending the same rectified-flow transformer
Plug-in for Figma / Photoshop using the provided PyTorch loader and ONNX export

The dataset, code, and model weights are all live today. Clone the repo, run the checksum, and you can be training in less time than it takes to finish your coffee.

Happy editing.

Unlock GPT-4o-Level Image Editing: The Complete Guide to GPT-IMAGE-EDIT-1.5M Dataset

GPT-IMAGE-EDIT-1.5M: A Practical Guide to Training Open-Source Image-Editing Models That Rival GPT-4o

Table of Contents

1. Why another image-editing dataset?

2. What exactly is GPT-IMAGE-EDIT-1.5M?

2.1 Scale and composition

2.2 One sample unpacked

3. How the dataset was built—step by step

3.1 Pass 1: Output regeneration

3.2 Pass 2: Instruction rewrite

3.3 Pass 3: Full pair regeneration (HQ-Edit only)

4. Hands-on experiment: reproducing the 7.24 GEdit-EN score

4.1 Base model

4.2 Training recipe (single-node, 8×A100 80 GB)

4.3 Results summary

5. Download, verify, and load the data

5.1 Where to get it

5.2 Folder layout

5.3 Integrity check

6. Frequently asked questions

7. Ready-to-use PyTorch dataset snippet

8. Next steps and closing thoughts

Immediate experiments to try

Unlock GPT-4o-Level Image Editing: The Complete Guide to GPT-IMAGE-EDIT-1.5M Dataset

GPT-IMAGE-EDIT-1.5M: A Practical Guide to Training Open-Source Image-Editing Models That Rival GPT-4o

Table of Contents

1. Why another image-editing dataset?

2. What exactly is GPT-IMAGE-EDIT-1.5M?

2.1 Scale and composition

2.2 One sample unpacked

3. How the dataset was built—step by step

3.1 Pass 1: Output regeneration

3.2 Pass 2: Instruction rewrite

3.3 Pass 3: Full pair regeneration (HQ-Edit only)

4. Hands-on experiment: reproducing the 7.24 GEdit-EN score

4.1 Base model

4.2 Training recipe (single-node, 8×A100 80 GB)

4.3 Results summary

5. Download, verify, and load the data

5.1 Where to get it

5.2 Folder layout

5.3 Integrity check

6. Frequently asked questions

7. Ready-to-use PyTorch dataset snippet

8. Next steps and closing thoughts

Immediate experiments to try

Related Posts