300 Real-World Machine Learning Systems: How They Went From Zero to Production
A plain-language field guide based on case studies from Netflix, Airbnb, DoorDash, and 77 other companies
“
If you can read a college textbook, you can read this post.
Every example comes from the public engineering blogs and papers listed at the end—nothing is made up, nothing is exaggerated.
Table of Contents
-
Why should you care about these 300 stories? -
The “elevator cheat sheet”: what problem each system solves in five words or less -
A bird’s-eye view of 10 industries and 300 lessons learned -
The universal seven-step playbook that keeps showing up -
Six stories told from the ground up -
Recommendation: Spotify’s “next song” engine -
Forecasting: DoorDash’s holiday surge predictor -
Fraud: Stripe’s real-time transaction guard -
Code generation: GitHub Copilot’s autocomplete brain -
Computer vision: Zillow’s floor-plan-from-photo tool -
Multimodal search: Airbnb’s “romantic cabin” sorter
-
-
Frequently asked questions (from juniors, by juniors) -
Your own 30-day starter plan -
Closing thoughts
1. Why should you care about these 300 stories?
Think of this article as a recipe book for machine-learning systems.
Instead of “how to bake a cake,” you get
-
“how Spotify decides which song to play next,” -
“how Uber predicts arrival times within one minute,” -
“how Stripe spots a stolen credit card in 200 ms.”
Each recipe contains:
By the end you will not be an expert, but you will never again ask, “How does a real system actually look?”
2. The “elevator cheat sheet”
3. A bird’s-eye view of 10 industries and 300 lessons learned
Below is a map, not a table of contents.
Use it to jump to the corner of the world that matches your job or curiosity.
4. The universal seven-step playbook that keeps showing up
Almost every case study fits this loop.
-
Translate the business goal
“More rides on Friday night” → “Increase Friday-night ride-request conversion by 3 %.” -
Inventory the data
Make a list: user logs, GPS pings, payment history, weather, public holidays. -
Label or define the target
Regression: ETA in minutes.
Classification: fraud = 1, safe = 0. -
Build a baseline
Start with logistic regression or gradient boosting—whatever runs in <1 hour on a laptop. -
Run a controlled experiment
A/B test or shadow mode (DoorDash calls it “dark launch”). -
Production plumbing
-
Feature store (Redis, BigQuery) -
Model registry (MLflow) -
Canary deploy (5 % traffic)
-
-
Continuous monitoring
Watch data drift, latency, cost. When any metric jumps 10 %, page the owner.
5. Six stories told from the ground up
5.1 Recommendation: Spotify’s “next song” engine
Goal
Keep the user listening instead of hitting “skip.”
Data
-
30 billion play events/day -
Audio features (tempo, key, valence) -
Context: time of day, device, playlist origin
Version history
-
v0: matrix factorization (2009) -
v1: Wide & Deep (2016) -
v2: Transformer + multi-task (2023)
Tricks that worked
-
Cold-start: use audio features only until enough play data arrives. -
Data imbalance: down-weight top artists to avoid feedback loops. -
Latency: 100 ms p95—embeddings pre-computed, served from Redis.
5.2 Forecasting: DoorDash’s holiday surge predictor
Problem
Thanksgiving volume spikes 4×; naive scaling wastes food and driver time.
Model stack
-
LightGBM for tabular history -
Prophet for weekly/annual seasonality -
Seq2Seq for city-level temporal patterns
Ensemble blended with Bayesian weights.
Features
-
Historical orders (3 years) -
Weather, school holidays -
Real-time driver count (Kafka stream)
Outcome
2023 Thanksgiving: +6 min average delivery vs. +28 min in 2022.
5.3 Fraud: Stripe’s real-time transaction guard
Window
200 ms to approve or decline.
Signals
-
Location jump: IP vs. shipping address distance -
Device fingerprint change -
Velocity: 3+ attempts in 60 s
Model
-
Gradient-boosted trees + graph neural network (cards, emails, IPs as nodes) -
SHAP values for human-readable reasons (required by regulators)
Result
False-positive rate cut by 30 % YoY without hurting conversion.
5.4 Code generation: GitHub Copilot’s autocomplete brain
Pipeline
-
Pre-train Code Llama on public GitHub code -
Fine-tune on permissively licensed snippets -
Context window: current file + 20 lines above cursor + repo path
Serving
-
KV-cache to reuse prefix tokens -
8-bit quantization, single GPU -
5 candidate completions, first-token latency 50 ms
Guardrails
-
Deduplication against public code -
Sensitive-word filter
5.5 Computer vision: Zillow’s floor-plan-from-photo tool
Input
360° panorama from phone camera.
Steps
-
Semantic segmentation (Detectron2) → walls, doors, windows -
Convert pixel mask to vector geometry -
Rule checker: doors must touch walls, rooms must form polygons
User impact
Brokers save ~30 min per listing.
5.6 Multimodal search: Airbnb’s “romantic cabin” sorter
Challenge
User types “romantic cabin with hot tub” and expects perfect matches.
Model
-
Text tower: BERT on listing title/description -
Image tower: ResNet on photo embeddings -
Cross-attention layer to score text-image fit
Gain
Couples segment booking conversion +12 %.
6. Frequently asked questions (from juniors, by juniors)
Q1: I only have a laptop. Can I still replicate these systems?
Yes. 80 % of the teams start with a 4-core CPU and <8 GB RAM. Move to GPU only after the baseline works.
Q2: What if my dataset is tiny?
-
Transfer learning: use BERT for text, ResNet for images. -
Weak supervision: DoorDash generated 1 M pseudo-labels with simple rules.
Q3: The model degrades after launch. How do I catch it early?
Plot daily distribution drift (Kolmogorov–Smirnov distance). Netflix alerts at 0.1.
Q4: How do I convince my manager to fund this?
Run a 2-week shadow mode and record “dollars saved” or “hours freed.” Stripe’s shadow run showed $3 M annual fraud loss reduction—budget approved overnight.
7. Your own 30-day starter plan
8. Closing thoughts
This post is a map, not a miracle cure.
Whenever you feel lost, open the original case study, find the company that solved a problem like yours, and copy the parts that fit.
If you want the raw links in one place, head to HorizonX.live or the Evidently ML-system-design repo.
Pick one story, run the 30-day plan above, and next month you will have your own production story to tell.