Claude Code Source Code Deep Dive: The Architecture of an Agent Operating System

What makes Claude Code fundamentally different from other AI coding assistants?

It is not merely a “chatbot that can call tools.” After analyzing 4,756 source files extracted from the npm package, the evidence points to a comprehensive Agent Operating System—one that unifies prompt architecture, tool runtime governance, permission models, agent orchestration, skill packaging, plugin systems, hook governance, MCP integration, context hygiene, and product engineering into a cohesive platform. This article unpacks the engineering beneath the surface.

What Does the Source Structure Reveal About System Complexity?

Summary: Claude Code’s directory structure exposes a platform-level design with multiple entry points, a command system serving as the control panel, and a tools layer that transforms the model from a “responder” into an “executor.”

The extracted source reveals at least these critical modules:

src/entrypoints/: CLI, initialization, MCP mode, SDK consumers
src/constants/: Prompts, system constants, risk instructions, output specifications
src/tools/: Tool definitions and implementations
src/services/: Runtime services including tools, MCP, analytics
src/commands/: Slash commands and the command system
src/components/: TUI/UI components
src/coordinator/: Coordinator patterns
src/memdir/: Memory and prompt management
src/plugins/ and src/utils/plugins/: Plugin ecosystem
src/hooks/ and src/utils/hooks.js: Hook system
src/bootstrap/: State initialization
src/tasks/: Local tasks, remote tasks, async agent tasks

This is not a simple CLI wrapper. The entry layer alone demonstrates platform thinking: four distinct entry points (CLI, initialization flow, MCP mode, SDK) allow the same agent runtime to serve multiple interaction surfaces.

The Command System as Control Panel

The command system exposes system-level commands including /mcp, /memory, /permissions, /hooks, /plugin, /reload-plugins, /skills, /tasks, /plan, /review, /status, /model, /output-style, /agents, and /sandbox-toggle. Critically, it does not merely register built-in commands—it uniformly loads plugin commands, skill commands, bundled skills, and dynamically discovered skills (filtered for availability).

Application Scenario: When you type /skills in the terminal, you see not a static list but a dynamically computed inventory of capabilities available for your current project context, installed plugins, and connected MCP servers. This makes the command system itself the unified entry point for the ecosystem.

The Tools Layer: Where Models Become Executors

The tools layer includes FileRead, FileEdit, FileWrite, Bash, Glob, Grep, TodoWrite, TaskCreate, AskUserQuestion, Skill, Agent, MCPTool, and Sleep. The essence of this layer is transforming the model from a “responder” into an “executor.” Claude Code’s stability stems significantly from this layer being formal, clear, and governable.

Author’s Reflection: Most open-source coding agents have a directory structure resembling a single main file, a prompt file, a few tool files, and utilities. Claude Code’s structure operates at a completely different magnitude—not for aesthetics, but because it solves fundamentally more complex problems.

How Does the System Prompt Work as a Runtime Assembler?

Summary: Claude Code’s system prompt is not static text but a dynamically assembled construct with static prefixes (cache-friendly) and dynamic suffixes (session-specific), governed by explicit cache boundary markers.

The most critical source file is src/constants/prompts.ts. It serves not as a repository of “magical copy” but as the master assembler for the system prompt—handling environment injection, tool usage specifications, security and risk action guidelines, session-specific guidance, language/output style, MCP instructions, memory prompts, scratchpad explanations, function result clearing hints, and feature-gated sections.

The Assembly Architecture

The getSystemPrompt() function constructs prompts through a two-part structure:

Static Prefix (Cache-Optimized):

getSimpleIntroSection(): Identity positioning
getSimpleSystemSection(): Base system specifications
getSimpleDoingTasksSection(): Task execution philosophy
getActionsSection(): Risk action specifications
getUsingYourToolsSection(): Tool usage specifications
getSimpleToneAndStyleSection(): Tone and style guidelines
getOutputEfficiencySection(): Output efficiency rules

Dynamic Suffix (Session-Conditional):

Session guidance
Memory
Ant model override
Environment info
Language
Output style
MCP instructions
Scratchpad
Function result clearing
Summarize tool results
Numeric length anchors
Token budget
Brief mode

Cache Economics as Infrastructure

The source explicitly defines SYSTEM_PROMPT_DYNAMIC_BOUNDARY with comments indicating: content before the boundary should remain cache-friendly; content after is user/session-specific and should not be arbitrarily modified to avoid breaking cache logic.

Application Scenario: Consider a development team working across multiple languages. When switching from English to Chinese, Claude Code does not reload the entire system. Instead, it keeps the static prefix unchanged (cache hit) and injects new language instructions after the dynamic boundary, preserving tool definitions and MCP instructions structure. This minimizes multi-language support costs while ensuring the model always knows which language to use.

Author’s Insight: This demonstrates “Prompt assembly with cache economics”—engineering optimization around token costs and cache hit rates. Most prompt engineering stops at “writing good text”; Claude Code treats prompts as orchestrated runtime resources with cost management built in.

How Are Model Behaviors Institutionalized?

Summary: Claude Code prevents behavioral drift not by hoping the model is “smarter” but by encoding behavioral norms into the system prompt and runtime rules.

The getSimpleDoingTasksSection() module constrains model behavior with specific rules:

Constraint Category	Specific Rules
Feature Boundaries	Do not add features the user did not request; avoid over-abstraction; no blind refactoring
Code Standards	Do not arbitrarily add comments/docstrings/type annotations; avoid unnecessary error handling/fallback logic
Workflow	Read code before modifying; do not casually create new files; avoid giving time estimates
Problem Solving	Diagnose before switching strategies when methods fail; delete confirmed unused items without compatibility baggage
Result Reporting	Report results truthfully; do not pretend to have tested when you haven’t

Risk Action Specifications

The getActionsSection() defines what constitutes “risk actions requiring confirmation”: destructive operations, hard-to-reverse operations, modifications to shared state, externally visible actions, and uploads to third-party tools. It emphasizes: do not use destructive actions as shortcuts; investigate unfamiliar states before acting; do not brutally delete merge conflicts or lock files.

Application Scenario: When you ask Claude Code to delete an old configuration file, it does not immediately execute rm. Instead, it checks whether the file is referenced by other processes, whether it has special meaning in git history, and whether deletion would affect team members. This “blast radius awareness” is not something the model figured out on its own—it is hardcoded in the prompt.

Tool Usage Grammar

The getUsingYourToolsSection() establishes clear tool strategies:

Read files using FileRead, not cat/head/tail/sed
Edit files using FileEdit, not sed/awk
Create files using FileWrite, not echo redirection
Search files using Glob; search content using Grep
Reserve Bash for truly shell-dependent scenarios
Use TodoWrite/TaskCreate when task management tools are available
Execute tool calls without dependencies in parallel

Author’s Reflection: Many coding agents are unstable not because they cannot write code, but because they use tools incorrectly—such as using bash sed to modify code, where a single regex error causes collapse. Claude Code’s stability is deeply tied to this “tool usage grammar.”

What Is the Agent Specialization Strategy?

Summary: Claude Code employs multiple built-in agents with clear role separation rather than a single “universal worker,” addressing the fundamental problem that one agent cannot simultaneously research, plan, implement, and verify effectively.

The source confirms at least six built-in agents:

Agent Type	Role	Key Constraints
General Purpose Agent	General task handling	No special restrictions
Explore Agent	Read-only code exploration	Absolutely read-only: cannot create, modify, delete, or move files; Bash limited to ls, git status, git log, git diff, find, grep, cat, head, tail
Plan Agent	Pure planning, no editing	Read-only; outputs step-by-step implementation plan; must list Critical Files for Implementation
Verification Agent	Adversarial validation	Goal is “try to break it”; must run build, test suite, linter/type-check; each check requires command and observed output; final verdict must be VERDICT: PASS/FAIL/PARTIAL
Claude Code Guide Agent	Product usage guidance	Helps users understand Claude Code features
Statusline Setup Agent	Status bar configuration	Handles statusline-related settings

The Explore Agent: Read-Only Specialist

The Explore Agent’s system prompt explicitly prohibits file creation, modification, deletion, or movement. It cannot write temporary files or use redirection/heredoc. Its Bash usage is strictly limited to read operations. The design intentionally trims it into a read-only specialist.

Application Scenario: When exploring a large codebase to understand architecture patterns, the Explore Agent can traverse files without risk of accidentally modifying something. This isolation prevents exploration-phase accidents from affecting subsequent implementation phases.

The Plan Agent: Architect, Not Executor

The Plan Agent is defined as architect/planner, not executor. It must understand requirements, explore codebase patterns and architecture, output step-by-step implementation plans, and list critical files for implementation. This separation of planning and implementation reduces role confusion.

The Verification Agent: The Most Valuable Design

The Verification Agent’s prompt explicitly identifies two failure modes:

Verification avoidance: Looking only at code without running checks, writing PASS and leaving
Being fooled by the first 80%: UI looks fine, tests pass, ignoring the remaining 20% of issues

Mandatory requirements include:

Run build, test suite, linter/type-check
Frontend changes: browser automation or page sub-resource verification
Backend changes: curl/fetch actual response testing
CLI: examine stdout/stderr/exit code
Migrations: test up/down and existing data
Refactors: test public API surface
Adversarial probes required
Each check must include command and output observed
Final output must be VERDICT: PASS/FAIL/PARTIAL

Author’s Insight: In traditional software engineering, separating implementers from validators is common sense. In AI agent systems, most products have not reached this step. Claude Code makes the Verification Agent independent, giving it no vested interest in the implementing agent’s success. The implementing agent may倾向于 believe their code is correct; the verification agent’s job is to find problems.

How Does the Agent Dispatch Chain Work?

Summary: A sub-agent moves through a 14-step pipeline from trigger to completion, with distinct Fork and Normal paths optimized for cache reuse and role clarity.

The complete dispatch chain abstracts to:

Main model decides to call Agent tool
AgentTool.call() parses input
Parse whether teammate/fork/built-in/background/worktree/remote
Select agent definition
Construct prompt messages
Construct/inherit system prompt
Assemble tool pool
Create agent-specific ToolUseContext
Register hooks/skills/MCP servers
Call runAgent()
runAgent() internally calls query()
query produces message stream
runAgent records transcript, handles lifecycle, cleans resources
AgentTool aggregates results or async task notification

Fork Path vs Normal Path

Dimension	Fork Path	Normal Path
Trigger	subagent_type omitted and fork feature enabled	Explicitly specify built-in/custom agent type
System Prompt	Inherits main thread	Generated from agentDefinition
Context	Full parent thread inheritance	Only required context for that agent
Tool Set	Consistent with parent (cache hit optimization)	Follows agent’s tool restrictions
Design Goal	Reuse main thread cache, avoid burning tokens	Strict isolation, clear roles

Application Scenario: When you fork a sub-task for time-consuming code analysis (like having Explore Agent traverse an entire codebase), you can close the terminal and work on something else. When complete, Claude Code notifies the main thread, allowing you to choose between summary or full output. This seamless foreground/background switching comes from complete agent lifecycle management.

Author’s Insight: The Fork path is not “just another agent”—it is an execution path specifically optimized for cache and context inheritance. Comments explicitly mention maintaining “byte-identical API request prefix” to improve prompt cache hit rates. Most people think “sub-tasks should run”; Claude Code thinks “sub-tasks should run AND reuse main thread cache without wasting tokens.”

How Do Skills, Plugins, and MCP Extend the System?

Summary: The ecosystem’s key is making the model “aware” of its capabilities through skills lists, agent lists, MCP instructions, session-specific guidance, and command integration.

Skill: Workflow Package

A Skill is not documentation but a markdown prompt bundle with frontmatter metadata:

Declares allowed-tools
Injects into current context on demand
Compresses repetitive workflows into reusable capability packages

The system requires: when a task matches a skill, you must call the Skill tool to execute it—you cannot merely mention the skill without execution.

Plugin: Prompt + Metadata + Runtime Constraints

Plugins provide:

Markdown commands
SKILL.md skill directories
commandsMetadata
userConfig (sensitive values stored in system keychain, not on disk)
Shell frontmatter
allowed-tools
Model/effort hints
Runtime variable replacement (supporting ${CLAUDE_PLUGIN_ROOT}, ${CLAUDE_SESSION_ID}, etc.)

MCP: More Than a Tool Bridge

When an MCP server connects, if the server provides instructions, those instructions are assembled into the system prompt. This means MCP simultaneously injects:

New tools
Instructions on how to use those tools

Application Scenario: Many platforms have plugin systems and tool markets, but the model itself does not know “what extensions exist, when to use them, or how to use them.” Claude Code makes the model “aware of its extended capabilities” through skills lists, agent lists, MCP instructions, session-specific guidance, and command integration. This is the prerequisite for an ecosystem to truly function—giving someone a professional toolbox is useless if they don’t know what’s inside or when to open it.

How Is Context Managed as a Scarce Resource?

Summary: Claude Code treats context as a budget to be managed, not free air, through static/dynamic boundaries, fork path cache optimization, on-demand skill injection, and compression/resume mechanisms.

Mechanism	Purpose	Scenario Value
System prompt static/dynamic boundary	Cache static portions, inject dynamic portions on demand	Reduce redundant computation costs
Fork path cache-identical prefix	Sub-tasks reuse main thread cache	Parallel complex tasks without wasting tokens
Skill on-demand injection	Not all loaded upfront	Keep initial context clean
MCP instructions by connection status	Unconnected server instructions don’t occupy space	Avoid invalid information interference
Function result clearing	Actively clean completed function results	Free context space
Summarize tool results	Automatic summarization of long results	Prevent single output from filling window
Compact/transcript/resume	Context compression and session recovery	Long sessions without interruption

Author’s Insight: For those building demos, context management is irrelevant—demos run a few times and end. For product builders, context economics directly impacts costs and experience. Processing tens of thousands of requests daily, each with thousands of tokens in system prompts, a 10% improvement in cache hit rates can save enough in a month to hire another person. These designs are not “over-optimization” but necessities at scale.

What Product Engineering Details Create the “Stable” Experience?

Summary: RunAgent includes product-level details like transcript recording, performance tracking, resource cleanup, and foreground/background lifecycle management that distinguish a prototype from a production product.

The runAgent() function contains numerous product-level details:

recordSidechainTranscript(): Records sidechain conversations
writeAgentMetadata(): Writes agent metadata
registerPerfettoAgent(): Performance tracking registration
cleanupAgentTracking(): Cleans tracking state
killShellTasksForAgent(): Terminates the agent’s shell tasks
Cleanup of session hooks, cloned file state, todos entries

Background agents have independent abort controllers, can run continuously in the background, return to the main thread via notification when complete, and support automatic summarization. Foreground agents can be converted to background during execution, with progress tracking.

Permission and Security Architecture

Claude Code does not simply tell the model to “be careful” but enforces security through multi-layer mechanisms:

Sandbox isolation: Tools run in controlled environments
Layered permissions: Hook → Policy → User approval decision layers
Human-in-the-loop: Critical operations require user confirmation
PreToolUse Hook: More comprehensive than most AI tools’ total security infrastructure

Application Scenario: In a financial project using Claude Code, you can implement through Hooks:

All database write operations require secondary confirmation
All external API calls must be logged for audit
All delete operations must first backup to designated directories

These policies require no core code modifications—just Hook configuration.

Practical Action Checklist

For technical leaders evaluating AI coding tools:

✅ Ideal Scenarios for Claude Code:

Complex refactoring requiring coordination across multiple files
Teams needing automated CI/CD integration
Environments with explicit security compliance requirements (needing permission layering)
Background analysis tasks requiring long runtimes
Organizations wanting to connect internal toolchains via MCP

⚠️ Scenarios Requiring Careful Evaluation:

Only needing IDE real-time completion (Copilot/Cursor more suitable)
Fully offline air-gapped environments (Claude Code requires cloud connectivity)
Teams sensitive to token consumption (Agent Teams use significant volume)

🔧 Quick Start Recommendations:

Create CLAUDE.md in project root to document project conventions
Connect team tools via /mcp (Jira, Slack, Notion, etc.)
Use /plan to have Plan Agent output implementation plans before execution
Enable Verification Agent for adversarial validation on critical tasks
Configure team-specific security policies through Hooks (e.g., database write confirmation)

One-Page Overview

Claude Code Core Architecture
├── Entry Layer: CLI / MCP / SDK / Initialization
├── Prompt Architecture: Static prefix + Dynamic boundary (cache optimized)
├── Tool Governance: Input validation → Hook interception → Permission decision → Execution → PostHook
├── Multi-Agent Division: Explore (read-only) / Plan (planning) / Verification (validation)
├── Dispatch Chain: 14-step Pipeline, Fork reuses main thread cache
├── Extension Ecosystem: Skill (workflow package) / Plugin (behavior extension) / MCP (tools + instructions)
├── Context Economics: Static/dynamic separation, on-demand injection, compression/recovery
└── Lifecycle: Transcript recording, performance tracking, resource cleanup, foreground/background switching

Design Philosophy: Do not leave "good behavior" to model improvisation—write it into the system
Competitive Moat: Not any single prompt, but the complete Agent Operating System

Frequently Asked Questions

Q1: What is the difference between Claude Code and GitHub Copilot?
Copilot is an IDE real-time completion tool; Claude Code is a terminal-native agent handling task-level delegation (cross-file refactoring, testing, Git workflows). They can complement each other.

Q2: Why does Claude Code need so many “agents”? Can’t one agent do everything?
A single agent simultaneously handling research, planning, implementation, and validation leads to role confusion and bias. Claude Code improves task stability through specialized division of labor.

Q3: What is MCP and why does it matter?
MCP (Model Context Protocol) is an open protocol allowing Claude Code to connect external data sources and tools. Crucially, MCP provides not just tools but instructions on how to use them, making the model truly aware of extended capabilities.

Q4: What’s the difference between forking a sub-task and creating a new agent?
Fork inherits the main thread’s system prompt and full context, keeping tool sets consistent to reuse prompt cache and reduce token costs. Suitable for research sub-tasks.

Q5: How does Claude Code’s “memory” function work?
Through the src/memdir/ module, implementing cross-session memory including user preferences, project constraints, and team conventions. Memory is stored as Markdown files, injected on demand—not loaded in full every session.

Q6: What can the Hook system do?
Hooks can intercept calls before tool execution, rewrite inputs, adjust permission behavior, or append messages and supplement context after execution. Suitable for implementing team-specific security policies or audit requirements.

Q7: Why does Claude Code emphasize “read code before modifying code”?
This is an explicit rule in getSimpleDoingTasksSection(), designed to prevent models from blindly modifying without understanding context, reducing refactoring errors and unnecessary abstraction.

Q8: What does Verification Agent’s “try to break it” mean?
The Verification Agent’s prompt explicitly requires actively seeking code defects (adversarial probes) rather than simply confirming “looks fine.” It must run build, test, linter, and output explicit VERDICT: PASS/FAIL/PARTIAL.

Claude Code Architecture Deep Dive: The Agent Operating System Powering Next-Gen AI Development