How I trained a ChatGPT-like model for less than the price of a pair of sneakers, served it in a browser, and didn’t break the cloud bill.
Hook: From “We Need 100?”
Picture this:
You walk out of a budget meeting where the exec just asked for a 175-billion-parameter model and a seven-figure CapEx. On the subway ride home you open GitHub, clone a repo, launch one script, and four hours later you’re chatting with your own LLM on a public IP. No slide decks, no purchase orders—just 8 GPUs, 100 bucks, and nanochat.
Below is the exact playbook, command-for-command, metric-for-metric. If you can ssh
and git
, you can reproduce the experiment before your coffee gets cold.
1. Why nanochat Matters for SEO & GEO (Skip if you only code)
-
Google E-E-A-T: Experience (✓ I ran it), Expertise (✓ metrics inside), Authoritativeness (✓ Karpathy repo), Trustworthiness (✓ all open-source). -
Generative Engine Optimization (GEO): Structured data ( HowTo
,FAQ
), semantic triples (“nanochat trains LLM”), and fresh numbers make AI-search engines (Bing Chat, Bard, Perplexity) more likely to cite this article. -
Keyword cluster: train ChatGPT clone, $100 LLM, nanochat tutorial, cheap AI model, end-to-end LLM pipeline.
2. Hardware & Cloud Bill (100% Real Cost)
Component | Spec | Price (Oct 2025, Lambda Cloud) |
---|---|---|
GPU | 8 × H100 80 GB SXM | $24/h |
Runtime | ~4 h | ≈ $96 |
Egress | < 1 GB | $0 |
Storage | 200 GB (free tier) | $0 |
Total | ≤ $100 |
Single-GPU mode works too—just 8× slower and still produces bit-for-bit identical checkpoints thanks to gradient accumulation.
3. 30-Second Environment Check
# 1. Spawn node with Ubuntu 22.04 + CUDA 12
ssh ubuntu@<your-ip>
git clone https://github.com/karpathy/nanochat.git && cd nanochat
# 2. Install uv (faster than pip)
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.cargo/env
uv pip install -r requirements.txt
# 3. Verify GPUs
nvidia-smi -L # should list 8 devices
4. The One-Script Pipeline: speedrun.sh
Deconstructed
Stage | Command (abridged) | Output | Wall Time |
---|---|---|---|
Tokenize | python -m nanochat.dataset -n 450 |
data/*.bin |
10 min |
Pre-train | torchrun --nproc_per_node=8 -m scripts.base_train |
ckpt/base.pt |
1 h 30 m |
Mid-train | torchrun --nproc_per_node=8 -m scripts.mid_train |
ckpt/mid.pt |
1 h 20 m |
SFT | torchrun --nproc_per_node=8 -m scripts.sft_train |
ckpt/sft.pt |
40 m |
Eval | python -m scripts.eval > report.md |
report.md |
5 m |
All hyper-parameters are in-script; no YAML monsters. Change
--depth
or--device_batch_size
and rerun—zero code archaeology required.
5. Launching the Chat UI (3 Commands)
# Still in venv
python -m scripts.chat_web
# Uvicorn running on http://0.0.0.0:8000
Open http://<public-ip>:8000
in a browser—mobile friendly, no JS build step.
Live screenshot from my run:
The model’s answer is surprisingly accurate for 4e19 FLOPs—kindergarten level but confident.
6. Report Card: Benchmarks Inside report.md
Benchmark | MID | SFT | RL (opt) |
---|---|---|---|
ARC-Easy | 0.356 | 0.388 | — |
GSM8K | 0.025 | 0.046 | 0.076 |
HumanEval | 0.067 | 0.085 | — |
MMLU | 0.311 | 0.315 | — |
Take-away: These numbers won’t wow an LLM leaderboard, but they will wow your CFO—because the whole run costs less than a team dinner.
7. Scaling to GPT-2 Territory (~$300)
Need a stronger baseline? Bump depth to 26 layers:
# Download more shards (formula: params×20×4.8÷250 M)
python -m nanochat.dataset -n 170
# Halve batch to fit 80 GB
torchrun --nproc_per_node=8 -m scripts.base_train --depth=26 --device_batch_size=16
Training stretches to 12 h → ≈ $300 and beats GPT-124M on CORE score. Same script, same repo—no hidden configs.
8. FAQ (Schema-marked)
Q1: Will it run on 4 × A100 40 GB?
A: Yes, set --device_batch_size=4
and expect 2× longer runtime.
Q2: Can I pause & resume?
A: Absolutely—pass --resume=ckpt/mid.pt
to any script; checkpoints save every 1,000 steps.
Q3: How do I feed Chinese data?
A: Tokenizer is BPE-based. Drop UTF-8 .txt
, rerun nanochat.dataset
, and keep the same filename pattern (shard_*.bin
). No code change needed.
Q4: Multi-node support?
A: Not yet. The repo is single-node by design for simplicity. PRs welcome.