$100 LLM Training: How to Build a ChatGPT Clone in 4 Hours

高效码农

6 hours ago

How I trained a ChatGPT-like model for less than the price of a pair of sneakers, served it in a browser, and didn’t break the cloud bill.

Hook: From “We Need $10 M “ t o “ G o t$ 100?”

Picture this:
You walk out of a budget meeting where the exec just asked for a 175-billion-parameter model and a seven-figure CapEx. On the subway ride home you open GitHub, clone a repo, launch one script, and four hours later you’re chatting with your own LLM on a public IP. No slide decks, no purchase orders—just 8 GPUs, 100 bucks, and nanochat.

Below is the exact playbook, command-for-command, metric-for-metric. If you can ssh and git, you can reproduce the experiment before your coffee gets cold.

1. Why nanochat Matters for SEO & GEO (Skip if you only code)

Google E-E-A-T: Experience (✓ I ran it), Expertise (✓ metrics inside), Authoritativeness (✓ Karpathy repo), Trustworthiness (✓ all open-source).
Generative Engine Optimization (GEO): Structured data (HowTo, FAQ), semantic triples (“nanochat trains LLM”), and fresh numbers make AI-search engines (Bing Chat, Bard, Perplexity) more likely to cite this article.
Keyword cluster: train ChatGPT clone, $100 LLM, nanochat tutorial, cheap AI model, end-to-end LLM pipeline.

2. Hardware & Cloud Bill (100% Real Cost)

Component	Spec	Price (Oct 2025, Lambda Cloud)
GPU	8 × H100 80 GB SXM	$24/h
Runtime	~4 h	≈ $96
Egress	< 1 GB	$0
Storage	200 GB (free tier)	$0
Total		≤ $100

Single-GPU mode works too—just 8× slower and still produces bit-for-bit identical checkpoints thanks to gradient accumulation.

3. 30-Second Environment Check

# 1. Spawn node with Ubuntu 22.04 + CUDA 12
ssh ubuntu@<your-ip>
git clone https://github.com/karpathy/nanochat.git && cd nanochat

# 2. Install uv (faster than pip)
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.cargo/env
uv pip install -r requirements.txt

# 3. Verify GPUs
nvidia-smi -L  # should list 8 devices

4. The One-Script Pipeline: `speedrun.sh` Deconstructed

Stage	Command (abridged)	Output	Wall Time
Tokenize	`python -m nanochat.dataset -n 450`	`data/*.bin`	10 min
Pre-train	`torchrun --nproc_per_node=8 -m scripts.base_train`	`ckpt/base.pt`	1 h 30 m
Mid-train	`torchrun --nproc_per_node=8 -m scripts.mid_train`	`ckpt/mid.pt`	1 h 20 m
SFT	`torchrun --nproc_per_node=8 -m scripts.sft_train`	`ckpt/sft.pt`	40 m
Eval	`python -m scripts.eval > report.md`	`report.md`	5 m

All hyper-parameters are in-script; no YAML monsters. Change --depth or --device_batch_size and rerun—zero code archaeology required.

5. Launching the Chat UI (3 Commands)

# Still in venv
python -m scripts.chat_web
# Uvicorn running on http://0.0.0.0:8000

Open http://<public-ip>:8000 in a browser—mobile friendly, no JS build step.

Live screenshot from my run:

The model’s answer is surprisingly accurate for 4e19 FLOPs—kindergarten level but confident.

6. Report Card: Benchmarks Inside `report.md`

Benchmark	MID	SFT	RL (opt)
ARC-Easy	0.356	0.388	—
GSM8K	0.025	0.046	0.076
HumanEval	0.067	0.085	—
MMLU	0.311	0.315	—

Take-away: These numbers won’t wow an LLM leaderboard, but they will wow your CFO—because the whole run costs less than a team dinner.

7. Scaling to GPT-2 Territory (~$300)

Need a stronger baseline? Bump depth to 26 layers:

# Download more shards (formula: params×20×4.8÷250 M)
python -m nanochat.dataset -n 170

# Halve batch to fit 80 GB
torchrun --nproc_per_node=8 -m scripts.base_train --depth=26 --device_batch_size=16

Training stretches to 12 h → ≈ $300 and beats GPT-124M on CORE score. Same script, same repo—no hidden configs.

8. FAQ (Schema-marked)

Q1: Will it run on 4 × A100 40 GB?
A: Yes, set --device_batch_size=4 and expect 2× longer runtime.

Q2: Can I pause & resume?
A: Absolutely—pass --resume=ckpt/mid.pt to any script; checkpoints save every 1,000 steps.

Q3: How do I feed Chinese data?
A: Tokenizer is BPE-based. Drop UTF-8 .txt, rerun nanochat.dataset, and keep the same filename pattern (shard_*.bin). No code change needed.

Q4: Multi-node support?
A: Not yet. The repo is single-node by design for simplicity. PRs welcome.

Hook: From “We Need 10M“to“Got100?”