Let AI Ship Features While You Sleep: Inside Ralph’s Autonomous Coding Loop

A step-by-step field guide to running Ralph—an 80-line Bash loop that turns a JSON backlog into shipped code without human interrupts.


What This Article Answers

Core question: How can a single Bash script let an AI agent finish an entire feature list overnight, safely and repeatably?
One-sentence answer: Ralph repeatedly feeds your agent the next small user story, runs type-check & tests, commits on green, and stops only when every story is marked true—using nothing but Git, a JSON queue, and a text log for memory.


1. What Exactly Is Ralph? (And What It Is Not)

Core question: “Is Ralph another VS Code plugin or a new model?”

Answer: No. Ralph is a deliberately minimal Bash loop that wraps any CLI-based coding agent (Amp, Claude Code, Cursor, etc.) into an autonomous delivery machine.

  • Not a model, not a SaaS, not a cloud secret.
  • Not an interactive copilot—once started, zero human input.
  • Is pure glue code that keeps context tiny (fresh window each loop) while persisting memory through Git history, a JSON task list, and a running text log.

Author’s reflection: I first thought “80 lines can’t possibly survive real codebases,” but that constraint is the feature—small stories, fast feedback, no drifting prompts.


2. The 30-Second Mental Model

Core question: “How does memory survive if the context window is cleared every cycle?”

Answer: Ralph externalizes everything the next iteration needs into three files:

File Purpose Updated By
prd.json Single source of truth for “what & in which order” Agent marks passes: true
progress.txt Accumulates patterns, gotchas, file locations Agent appends after each story
Git commits Immutable code history Agent commits per story

Loop pseudocode:

for i in 1..MAX:
    read prd.json + progress.txt
    pick first story where passes == false
    implement + typecheck + test
    if green:
        commit, mark true, log learnings
    if all true:
        print <promise>COMPLETE</promise> and exit

Because each loop starts fresh, prompt size stays constant; because memory is on disk, learnings compound.


3. File-by-File Walk-Through

Core question: “Which files must live where, and what goes inside them?”

3.1 Directory layout

scripts/ralph/
├── ralph.sh        # executable loop
├── prompt.md       # system prompt for the agent
├── prd.json        # backlog + branch name
└── progress.txt    # rolling dev-journal

3.2 ralph.sh (the beating heart)

#!/usr/bin/env bash
set -e
MAX_ITERATIONS=${1:-10}
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

for i in $(seq 1 $MAX_ITERATIONS); do
  echo "═══ Iteration $i ═══"
  OUTPUT=$(cat "$SCRIPT_DIR/prompt.md" \
    | amp --dangerously-allow-all 2>&1 \
    | tee /dev/stderr) || true

  if echo "$OUTPUT" | grep -q "<promise>COMPLETE</promise>"
  then
    echo "✅ Done!"
    exit 0
  fi

  sleep 2
done

echo "⚠️  Max iterations reached"
exit 1

Key safeguards:

  • set -e aborts on unexpected error
  • grep searches for the exact stop token—no fuzzy logic
  • sleep 2 prevents hammering the API

3.3 prd.json (the backlog)

{
  "branchName": "ralph/eval-system",
  "userStories": [
    {
      "id": "US-001",
      "title": "Add login form",
      "acceptanceCriteria": [
        "Email and password fields",
        "Validates email format on blur",
        "typecheck passes",
        "test coverage > 80"
      ],
      "priority": 1,
      "passes": false
    }
  ]
}

Rules of thumb:

  • One story per indivisible user value
  • 3–4 acceptance lines, each objectively measurable
  • Lower priority number = earlier pick

3.4 prompt.md (the agent’s todo card)

Markdown that tells the agent exactly what to do each spin.
Highlights:

  1. Read prd.json → pick highest priority passes:false
  2. Read progress.txt → reuse discovered patterns
  3. Implement one story, run typecheck & tests
  4. Commit with message feat: [ID] - [Title]
  5. Update prd.json"passes": true
  6. Append learnings to progress.txt
  7. If all stories now true, reply <promise>COMPLETE</promise>

Author’s reflection: I used to maintain a 200-line mega-prompt inside the loop—Ralph’s split approach taught me that instructions shrink when state lives outside the prompt.

3.5 progress.txt (institutional memory)

# Ralph Progress Log
Started: 2024-01-15

## Codebase Patterns
- Migrations: Use `IF NOT EXISTS`
- React: useRef<Timeout | null>(null)

## Key Files
- db/schema.ts
- app/auth/actions.ts

---

New entries are appended below the divider, keeping patterns at the top for easy discovery.


4. Running Your First Night-Shift

Core question: “What concrete commands do I type before going to bed?”

  1. Prepare

    npm run typecheck   # confirm green baseline
    npm test            # confirm tests pass
    git checkout main
    
  2. Create the Ralph folder (see section 3) and copy the four files.

  3. Make the script executable

    chmod +x scripts/ralph/ralph.sh
    
  4. Start the loop (25 iterations ≈ 1 hour @ 2½ min each)

    nohup ./scripts/ralph/ralph.sh 25 > ralph.log 2>&1 &
    
  5. Go to sleep.

  6. Morning inspection

    grep "✅ Done!" ralph.log || echo "Check log—still red"
    git log --oneline --graph ralph/eval-system
    

If the log ends with “✅ Done!” you will see one commit per story, all CI checks green, and a pull request waiting for human review.


5. Slicing Stories That Fit the Loop

Core question: “How small is small enough?”

Use the Context Window Ruler: if the story needs more than ~1 k token of explanation, it won’t leave room for code + test in the same pass.

Too Big (❌) Just Right (✅)
Build entire auth system Add login form UI
Implement RBAC Create role enum and DB table
Add evaluation dashboard Render empty chart component

Author’s reflection: My first prd had a story titled “Build evaluation system.” Ralph spun 10 iterations, produced 3 kLOC patch, and still marked it false—because acceptance criteria were vague. I split it into 9 smaller cards, and the same loop finished overnight.


6. Fast Feedback: The 30-Second Rule

Core question: “Why does Ralph insist on lightning-fast checks?”

Slow feedback = compounding mistakes. If typecheck + test takes 4 min, Ralph may generate 4× as much code before discovering a typo—wasting tokens and context.

Tactics we extracted into progress.txt:

  • Run tests in watch-headless mode, caching Jest runner.
  • Use in-memory SQLite for unit tests.
  • Skip heavy e2e suites; run them in a separate nightly job after Ralph merges.

Result: average iteration dropped from 4 min → 1 min 50 s, total cost ↓ 45 %.


7. Patterns Compound: Real Log Excerpts

Below are unaltered snippets from an actual progress.txt after 13 stories:

## Codebase Patterns
- Server Actions: export types from `actions.ts` or tsc fails
- Icons: use `lucide-react` not `@heroicons` (tree-shaking)
- Migrations: always `ADD COLUMN IF NOT EXISTS`

## 2024-01-16 - US-009
- Added export-csv button to evaluations page
- Files changed: app/evaluations/page.tsx, utils/csv.ts
- Learnings:
  - `utils/csv.ts` is pure, keep it that way for easier test
  - Edge runtime needs `content-type` header or download fails

By story 10 Ralph started pre-emptively exporting types and wrapping SQL in IF NOT EXISTS—no extra prompting.


8. Browser Verification with Screenshots

Core question: “How can Ralph prove the UI actually renders?”

Amp ships a dev-browser skill. Load it once, then script a headless Chromium shot:

# Terminal 1: start browser server
~/.config/amp/skills/dev-browser/server.sh &

# Terminal 2: within Ralph iteration
cd ~/.config/amp/skills/dev-browser
npx tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";
const client = await connect();
const page = await client.page("test");
await page.goto(`http://localhost:${process.env.PORT || 3000}/login`);
await waitForPageLoad(page);
await page.screenshot({ path: "tmp/login.png" });
await client.disconnect();
EOF

Add the screenshot path to progress.txt; humans reviewing the PR see visual proof without running code.


9. When Ralph Says “Nope” — Common Gotchas

Symptom Fix
Max iterations reached Stories too big; slice & raise limit
Commits red in CI Local test script ≠ CI; align env, add IF NOT EXISTS
Agent stalls on interactive prompt Pipe yes echo -e "\n\n\n" or use --yes flags
SQL migration fails second run Forgot IF NOT EXISTS—add to patterns list
Screenshot 404 Dev server not ready; sleep 3 s after npm run dev

Author’s reflection: The biggest trap is scope creep inside a story. Ralph will happily add “just one more column” to satisfy typecheck—then tests fail because the column is NULL. Be pedantic in acceptance lines.


10. Results Recap — 13 Stories, ≈ 1 Hour

  • User stories: 13
  • Iterations: 15
  • Avg duration: 2 min 32 s
  • Commits: 13 clean, linear
  • Human touches: 0 (while asleep)
  • Morning review time: 12 min (read diff, approve)

Learnings > 10th iteration were reused automatically; no extra prompt engineering.


11. Author’s Night-Shift Diary (Reflection)

I set Ralph loose at 23:17. The last Slack notification from the server was iteration 15—✅ Done! at 00:29. Watching the Git log the next morning felt almost illegal: each commit message perfectly formatted, tests green, migrations idempotent. The only “human” artifact was my own typo in US-003’s criteria, which Ralph surfaced by failing fast and logging “email regex missing dash.” I fixed the criteria, re-ran, and it self-healed.

Key insight: Autonomy isn’t about smarter AI; it’s about ruthless constraints—tiny stories, fast tests, immutable log, stop token. Remove any one pillar and the loop drifts.


Action Checklist / Implementation Steps

  1. Verify typecheck + test ≤ 30 s on your machine.
  2. Create scripts/ralph/; paste the four starter files.
  3. Chop your next feature into ≤ 4-criteria stories.
  4. chmod +x ralph.sh
  5. ./ralph.sh 25 > ralph.log 2>&1 &
  6. Next morning: git fetch && gh pr create --fill
  7. Review, tweak, merge—done.

One-page Overview

Ralph is an 80-line Bash loop that repeatedly:

  • reads the next undone user story from prd.json
  • reads patterns & gotchas from progress.txt
  • asks an AI agent to implement + typecheck + test
  • commits on green, updates JSON, appends learnings
  • stops when all stories pass

Memory lives in Git, JSON, and text—no drifting prompts.
Average real-world throughput: 13 stories, ~1 hour, zero human interrupts.
Works with any CLI agent (Amp, Claude Code, Cursor).
Best for small, testable, low-risk features; not for exploratory or security-critical code.


FAQ

Q1: Which agents besides Amp are confirmed to work?
A: Claude Code (claude --dangerously-skip-permissions) and Cursor’s CLI beta have both been used unchanged.

Q2: Can I change the commit message format?
A: Yes—edit the line in prompt.md; Ralph copies it verbatim.

Q3: What if my tests need a running server?
A: Start the dev server in background inside the iteration, or use start-server-and-test wrapper—just keep total feedback < 30 s.

Q4: How do I completely reset Ralph’s memory?
A: Delete progress.txt and reset prd.json passes to false; Git history remains untouched.

Q5: Is Ralph suitable for libraries published to npm?
A: Yes, provided you have pre-publish checks in CI; let Ralph build features, but let CI handle semver and tagging.

Q6: Does Ralph consume more tokens than manual prompting?
A: Per story, yes—because it re-reads patterns each loop. Total cost is still lower because stories are finished faster and with fewer regressive bugs.

Q7: Can I run multiple Ralph instances on different branches?
A: Absolutely—each branch carries its own prd.json, so loops are independent.