Silver Bullet or Ball and Chain? The Claude Agent SDK Architecture After You Peek Into node_modules
What really happens when you install the Claude Agent SDK?
You get a thin TypeScript wrapper around a 190 MB Go binary that is the actual agent runtime—this article unpacks what that means for your project, wallet, and freedom to choose models.
1. The Two-Line Install That Pulls 190 MB of Go
Core question: Why does a simple npm install suddenly drop a CLI tool written in Go into my laptop?
Official docs tell you to run:
npm install -g @anthropic-ai/claude-code # 190 MB CLI
npm install @anthropic-ai/claude-agent-sdk # 3 MB wrapper
The second package is only a pipe that spawns the first.
If your CI image caches node_modules but not global packages, the SDK will fail at runtime with claude-code: command not found—a mistake I made on day one.
Author reflection:
I thought I was adding a dependency; I was actually adding a runtime I don’t control and can’t vendor.
2. Anatomy of a query() Call: Fork, Talk, Wait, Repeat
Core question: How does one line of TypeScript turn into an autonomous coding agent?
for await (const msg of query("find auth bugs", { allowedTools: ["Read", "Bash"] })) {
console.log(msg);
}
Sequence inside the SDK:
-
child_process.spawn("claude-code", ["--daemon"]) -
JSON-RPC over stdio: send request -
CLI runs its own agentic loop (Gather → Act → Verify → Repeat) -
Stream results back until Stopor 50-round limit
Scenario:
In a GitHub Action that spins up 50 parallel jobs, each fork costs 180 ms + 190 MB resident memory. My 4-core runner hit OOM at job #18. I had to throttle concurrency and add swap.
3. Three Layers of Hard Coupling
3.1 Process Coupling
-
The SDK must launch the CLI for every new conversation. -
If the CLI exits 137 (SIGKILL), your code receives an empty async iterator—no error is thrown, so you need to watch process.exitCodeyourself.
3.2 Feature Coupling
| Selling point | Implemented in | SDK can override? |
|---|---|---|
| Agentic loop | CLI | ❌ |
| Read/Bash/Edit tools | CLI | ❌ |
| Skills loader | CLI | ❌ |
| Sub-agent orchestration | CLI | ❌ |
Take-away: You can configure features, but you can’t change them.
3.3 Prompt Coupling
const options = { systemPrompt: { preset: "claude_code" } };
That preset contains 1 200 tokens of guardrails, tool examples, and formatting rules tuned for Claude.
Feeding the same prompt to GPT-5.2 drops tool-calling accuracy from 92 % to 74 % in my single-attempt test.
4. Model Lock-In: Claude-Only in Practice
Core question: The docs list Bedrock, Vertex, and Foundry—doesn’t that mean choice?
All three hosts still serve Claude models.
Swap in a different family and you lose:
-
200 K context window tuned for multi-file search -
Self-reflection style that keeps the Verify step short -
Tool-call syntax the prompt was written for
Benchmark (one task, same repo):
| Model | Bugs found | Time | Missed edge case |
|---|---|---|---|
| Claude Opus 4.5 | 5 | 8 m | none |
| GPT-5.2 | 4 | 12 m | 1 |
Cost multiplier: Agent tasks average 5–10× token burn of a chat session because every tool call round-trips the entire context.
Author reflection:
I briefly tried retrofitting the SDK to call a cheaper 32 K model. The loop drowned in context swaps and ran 18 rounds instead of 7—cheaper per token, pricier in total.
5. Price Tag: From $0.44 a Task to $13 k a Month
Core question: How expensive can a “simple” agent job get?
Typical Sonnet 4.5 consumption:
-
Initial context: 10 K in -
10 tool rounds × (2 K tool + 5 K result + 1 K thought) = 80 K in, 10 K out -
Final answer: 3 K out
Total: 90 K in ($0.27) + 13 K out ($0.195) ≈ $0.44
At 1 000 tasks/day that is $13 200 per month—before you touch Opus.
Mitigation ladder:
| Tier | Model | Cost/1M (in/out) | Use for |
|---|---|---|---|
| 1 | Haiku 4.5 | $1 / $5 | Pre-filter, large sweep |
| 2 | Sonnet 4.5 | $3 / $15 | Standard implementation |
| 3 | Opus 4.5 | $5 / $25 | Security audit, release gate |
6. Production Docker: Bigger, Slower, Harder
Core question: What special care does the runtime need in production?
Minimal Dockerfile:
FROM node:20
# 190 MB extra
RUN npm install -g @anthropic-ai/claude-code@2.1.1
WORKDIR /app
COPY package*.json ./
RUN npm ci
# Cache & config must survive pod restart
VOLUME /root/.claude
ENV ANTHROPIC_API_KEY=
CMD ["node", "index.js"]
Operational notes:
-
Image grows from 220 MB → 410 MB -
Cold-start latency +6 s on a 2 vCPU cloud run instance -
You now patch two supply chains: Node + Go binary
Scenario:
I once forgot to mount the volume. Every pod restart re-downloaded the CLI update (90 MB) and re-indexed the repo, tripling bootstrap time and blowing our GCS egress budget.
7. Open Source—Except Where It Matters
Core question: If the SDK is on GitHub, can’t the community fork and remix?
You can PR docs, add helper methods, even swap the HTTP backend—but the agent brain lives in the opaque CLI.
That means:
-
No ARM build for Raspberry Pi clusters -
No air-gapped offline mode (still calls cloud API) -
No custom tool execution engine
Author reflection:
It feels like owning a Ferrari with the hood welded shut. You can change the rims, not the engine.
8. Decision Matrix: Should You Bet on the SDK?
| Criteria | Favours SDK | Favours alternatives |
|---|---|---|
| Max agent quality | ✔ | |
| Model flexibility | ✔ | |
| Tight budget | ✔ | |
| Need deep tool hack | ✔ | |
| Rapid prototype | ✔ | |
| Regulatory air-gap | ✔ |
Rule of thumb:
If agent correctness directly converts to revenue (security review, release gate), the SDK is still the fastest route to production-grade performance—just budget for lock-in.
9. Action Checklist / Implementation Steps
-
Validate that your security team allows Claude-only models. -
Pre-install the CLI in your base image; do not rely on global npm i -gat runtime. -
Mount /root/.claudeto a persistent volume or cache bucket. -
Set CLAUDE_CODE_VERSIONpin to avoid surprise updates. -
Instrument token usage: log input_tokensandoutput_tokenson everyquery()return. -
Create a three-tier model ladder (Haiku → Sonnet → Opus) and auto-downgrade on budget alert. -
Monitor CLI exit codes; restart pod on 137to avoid silent hangs. -
Run a monthly “what if we port to LangChain” spike—prices and models move fast.
10. One-page Overview
-
The SDK is a thin TypeScript client; the real agent lives inside the 190 MB Claude Code CLI. -
Every query()forks that CLI and speaks JSON-RPC over stdio. -
All tools, loops, skills, sub-agents are implemented in the CLI—unchangeable from your code. -
Only Claude models are supported; swapping to GPT/Gemini drops accuracy and increases rounds. -
Agent tasks burn 5–10× tokens versus chat; expect $0.30–$0.70 per complex job with Sonnet. -
Production images grow by ~200 MB and need persistent cache volumes. -
The stack is half-open-source: you can read the SDK, but not the agent runtime. -
Choose the SDK when quality > flexibility; otherwise build on a model-agnostic framework.
11. FAQ
Q1: Can I run the SDK entirely offline?
A: No. The CLI still calls Anthropic (or Bedrock/Vertex) APIs.
Q2: Is there a way to reduce cold-start time in serverless?
A: Pre-install the CLI in a custom runtime and keep /root/.claude warm via volume snapshots.
Q3: What happens if the CLI process crashes?
A: The async iterator ends silently; watch child.exitCode and retry.
Q4: Can I write my own Tool and load it?
A: Only via the Skills interface—those are still executed by the CLI, not your Node process.
Q5: How do I switch from Opus to Haiku on the fly?
A: Pass model: "claude-haiku-4.5" in the query options; no code changes needed.
Q6: Does Anthropic provide an SLA for the CLI binary?
A: No SLA covers the OSS SDK or the CLI; only the paid API tier has uptime guarantees.
Q7: Is the CLI’s 50-round limit configurable?
A: Yes, set CLAUDE_CODE_MAX_ROUNDS environment variable.

