Let AI Ship Features While You Sleep: Inside Ralph’s Autonomous Coding Loop
A step-by-step field guide to running Ralph—an 80-line Bash loop that turns a JSON backlog into shipped code without human interrupts.
What This Article Answers
Core question: How can a single Bash script let an AI agent finish an entire feature list overnight, safely and repeatably?
One-sentence answer: Ralph repeatedly feeds your agent the next small user story, runs type-check & tests, commits on green, and stops only when every story is marked true—using nothing but Git, a JSON queue, and a text log for memory.
1. What Exactly Is Ralph? (And What It Is Not)
Core question: “Is Ralph another VS Code plugin or a new model?”
Answer: No. Ralph is a deliberately minimal Bash loop that wraps any CLI-based coding agent (Amp, Claude Code, Cursor, etc.) into an autonomous delivery machine.
-
Not a model, not a SaaS, not a cloud secret. -
Not an interactive copilot—once started, zero human input. -
Is pure glue code that keeps context tiny (fresh window each loop) while persisting memory through Git history, a JSON task list, and a running text log.
Author’s reflection: I first thought “80 lines can’t possibly survive real codebases,” but that constraint is the feature—small stories, fast feedback, no drifting prompts.
2. The 30-Second Mental Model
Core question: “How does memory survive if the context window is cleared every cycle?”
Answer: Ralph externalizes everything the next iteration needs into three files:
| File | Purpose | Updated By |
|---|---|---|
prd.json |
Single source of truth for “what & in which order” | Agent marks passes: true |
progress.txt |
Accumulates patterns, gotchas, file locations | Agent appends after each story |
| Git commits | Immutable code history | Agent commits per story |
Loop pseudocode:
for i in 1..MAX:
read prd.json + progress.txt
pick first story where passes == false
implement + typecheck + test
if green:
commit, mark true, log learnings
if all true:
print <promise>COMPLETE</promise> and exit
Because each loop starts fresh, prompt size stays constant; because memory is on disk, learnings compound.
3. File-by-File Walk-Through
Core question: “Which files must live where, and what goes inside them?”
3.1 Directory layout
scripts/ralph/
├── ralph.sh # executable loop
├── prompt.md # system prompt for the agent
├── prd.json # backlog + branch name
└── progress.txt # rolling dev-journal
3.2 ralph.sh (the beating heart)
#!/usr/bin/env bash
set -e
MAX_ITERATIONS=${1:-10}
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
for i in $(seq 1 $MAX_ITERATIONS); do
echo "═══ Iteration $i ═══"
OUTPUT=$(cat "$SCRIPT_DIR/prompt.md" \
| amp --dangerously-allow-all 2>&1 \
| tee /dev/stderr) || true
if echo "$OUTPUT" | grep -q "<promise>COMPLETE</promise>"
then
echo "✅ Done!"
exit 0
fi
sleep 2
done
echo "⚠️ Max iterations reached"
exit 1
Key safeguards:
-
set -eaborts on unexpected error -
grepsearches for the exact stop token—no fuzzy logic -
sleep 2prevents hammering the API
3.3 prd.json (the backlog)
{
"branchName": "ralph/eval-system",
"userStories": [
{
"id": "US-001",
"title": "Add login form",
"acceptanceCriteria": [
"Email and password fields",
"Validates email format on blur",
"typecheck passes",
"test coverage > 80"
],
"priority": 1,
"passes": false
}
]
}
Rules of thumb:
-
One story per indivisible user value -
3–4 acceptance lines, each objectively measurable -
Lower priority number = earlier pick
3.4 prompt.md (the agent’s todo card)
Markdown that tells the agent exactly what to do each spin.
Highlights:
-
Read prd.json→ pick highest prioritypasses:false -
Read progress.txt→ reuse discovered patterns -
Implement one story, run typecheck & tests -
Commit with message feat: [ID] - [Title] -
Update prd.json→"passes": true -
Append learnings to progress.txt -
If all stories now true, reply <promise>COMPLETE</promise>
Author’s reflection: I used to maintain a 200-line mega-prompt inside the loop—Ralph’s split approach taught me that instructions shrink when state lives outside the prompt.
3.5 progress.txt (institutional memory)
# Ralph Progress Log
Started: 2024-01-15
## Codebase Patterns
- Migrations: Use `IF NOT EXISTS`
- React: useRef<Timeout | null>(null)
## Key Files
- db/schema.ts
- app/auth/actions.ts
---
New entries are appended below the divider, keeping patterns at the top for easy discovery.
4. Running Your First Night-Shift
Core question: “What concrete commands do I type before going to bed?”
-
Prepare
npm run typecheck # confirm green baseline npm test # confirm tests pass git checkout main -
Create the Ralph folder (see section 3) and copy the four files.
-
Make the script executable
chmod +x scripts/ralph/ralph.sh -
Start the loop (25 iterations ≈ 1 hour @ 2½ min each)
nohup ./scripts/ralph/ralph.sh 25 > ralph.log 2>&1 & -
Go to sleep.
-
Morning inspection
grep "✅ Done!" ralph.log || echo "Check log—still red" git log --oneline --graph ralph/eval-system
If the log ends with “✅ Done!” you will see one commit per story, all CI checks green, and a pull request waiting for human review.
5. Slicing Stories That Fit the Loop
Core question: “How small is small enough?”
Use the Context Window Ruler: if the story needs more than ~1 k token of explanation, it won’t leave room for code + test in the same pass.
| Too Big (❌) | Just Right (✅) |
|---|---|
| Build entire auth system | Add login form UI |
| Implement RBAC | Create role enum and DB table |
| Add evaluation dashboard | Render empty chart component |
Author’s reflection: My first prd had a story titled “Build evaluation system.” Ralph spun 10 iterations, produced 3 kLOC patch, and still marked it false—because acceptance criteria were vague. I split it into 9 smaller cards, and the same loop finished overnight.
6. Fast Feedback: The 30-Second Rule
Core question: “Why does Ralph insist on lightning-fast checks?”
Slow feedback = compounding mistakes. If typecheck + test takes 4 min, Ralph may generate 4× as much code before discovering a typo—wasting tokens and context.
Tactics we extracted into progress.txt:
-
Run tests in watch-headless mode, caching Jest runner. -
Use in-memory SQLite for unit tests. -
Skip heavy e2e suites; run them in a separate nightly job after Ralph merges.
Result: average iteration dropped from 4 min → 1 min 50 s, total cost ↓ 45 %.
7. Patterns Compound: Real Log Excerpts
Below are unaltered snippets from an actual progress.txt after 13 stories:
## Codebase Patterns
- Server Actions: export types from `actions.ts` or tsc fails
- Icons: use `lucide-react` not `@heroicons` (tree-shaking)
- Migrations: always `ADD COLUMN IF NOT EXISTS`
## 2024-01-16 - US-009
- Added export-csv button to evaluations page
- Files changed: app/evaluations/page.tsx, utils/csv.ts
- Learnings:
- `utils/csv.ts` is pure, keep it that way for easier test
- Edge runtime needs `content-type` header or download fails
By story 10 Ralph started pre-emptively exporting types and wrapping SQL in IF NOT EXISTS—no extra prompting.
8. Browser Verification with Screenshots
Core question: “How can Ralph prove the UI actually renders?”
Amp ships a dev-browser skill. Load it once, then script a headless Chromium shot:
# Terminal 1: start browser server
~/.config/amp/skills/dev-browser/server.sh &
# Terminal 2: within Ralph iteration
cd ~/.config/amp/skills/dev-browser
npx tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";
const client = await connect();
const page = await client.page("test");
await page.goto(`http://localhost:${process.env.PORT || 3000}/login`);
await waitForPageLoad(page);
await page.screenshot({ path: "tmp/login.png" });
await client.disconnect();
EOF
Add the screenshot path to progress.txt; humans reviewing the PR see visual proof without running code.
9. When Ralph Says “Nope” — Common Gotchas
| Symptom | Fix |
|---|---|
| Max iterations reached | Stories too big; slice & raise limit |
| Commits red in CI | Local test script ≠ CI; align env, add IF NOT EXISTS |
| Agent stalls on interactive prompt | Pipe yes echo -e "\n\n\n" or use --yes flags |
| SQL migration fails second run | Forgot IF NOT EXISTS—add to patterns list |
| Screenshot 404 | Dev server not ready; sleep 3 s after npm run dev |
Author’s reflection: The biggest trap is scope creep inside a story. Ralph will happily add “just one more column” to satisfy typecheck—then tests fail because the column is NULL. Be pedantic in acceptance lines.
10. Results Recap — 13 Stories, ≈ 1 Hour
-
User stories: 13 -
Iterations: 15 -
Avg duration: 2 min 32 s -
Commits: 13 clean, linear -
Human touches: 0 (while asleep) -
Morning review time: 12 min (read diff, approve)
Learnings > 10th iteration were reused automatically; no extra prompt engineering.
11. Author’s Night-Shift Diary (Reflection)
I set Ralph loose at 23:17. The last Slack notification from the server was iteration 15—✅ Done! at 00:29. Watching the Git log the next morning felt almost illegal: each commit message perfectly formatted, tests green, migrations idempotent. The only “human” artifact was my own typo in US-003’s criteria, which Ralph surfaced by failing fast and logging “email regex missing dash.” I fixed the criteria, re-ran, and it self-healed.
Key insight: Autonomy isn’t about smarter AI; it’s about ruthless constraints—tiny stories, fast tests, immutable log, stop token. Remove any one pillar and the loop drifts.
Action Checklist / Implementation Steps
-
Verify typecheck + test ≤ 30 s on your machine. -
Create scripts/ralph/; paste the four starter files. -
Chop your next feature into ≤ 4-criteria stories. -
chmod +x ralph.sh -
./ralph.sh 25 > ralph.log 2>&1 & -
Next morning: git fetch && gh pr create --fill -
Review, tweak, merge—done.
One-page Overview
Ralph is an 80-line Bash loop that repeatedly:
-
reads the next undone user story from prd.json -
reads patterns & gotchas from progress.txt -
asks an AI agent to implement + typecheck + test -
commits on green, updates JSON, appends learnings -
stops when all stories pass
Memory lives in Git, JSON, and text—no drifting prompts.
Average real-world throughput: 13 stories, ~1 hour, zero human interrupts.
Works with any CLI agent (Amp, Claude Code, Cursor).
Best for small, testable, low-risk features; not for exploratory or security-critical code.
FAQ
Q1: Which agents besides Amp are confirmed to work?
A: Claude Code (claude --dangerously-skip-permissions) and Cursor’s CLI beta have both been used unchanged.
Q2: Can I change the commit message format?
A: Yes—edit the line in prompt.md; Ralph copies it verbatim.
Q3: What if my tests need a running server?
A: Start the dev server in background inside the iteration, or use start-server-and-test wrapper—just keep total feedback < 30 s.
Q4: How do I completely reset Ralph’s memory?
A: Delete progress.txt and reset prd.json passes to false; Git history remains untouched.
Q5: Is Ralph suitable for libraries published to npm?
A: Yes, provided you have pre-publish checks in CI; let Ralph build features, but let CI handle semver and tagging.
Q6: Does Ralph consume more tokens than manual prompting?
A: Per story, yes—because it re-reads patterns each loop. Total cost is still lower because stories are finished faster and with fewer regressive bugs.
Q7: Can I run multiple Ralph instances on different branches?
A: Absolutely—each branch carries its own prd.json, so loops are independent.

