Claude Code Source Code Deep Dive: The Architecture of an Agent Operating System

What makes Claude Code fundamentally different from other AI coding assistants?

It is not merely a “chatbot that can call tools.” After analyzing 4,756 source files extracted from the npm package, the evidence points to a comprehensive Agent Operating System—one that unifies prompt architecture, tool runtime governance, permission models, agent orchestration, skill packaging, plugin systems, hook governance, MCP integration, context hygiene, and product engineering into a cohesive platform. This article unpacks the engineering beneath the surface.


What Does the Source Structure Reveal About System Complexity?

Summary: Claude Code’s directory structure exposes a platform-level design with multiple entry points, a command system serving as the control panel, and a tools layer that transforms the model from a “responder” into an “executor.”

The extracted source reveals at least these critical modules:

  • src/entrypoints/: CLI, initialization, MCP mode, SDK consumers
  • src/constants/: Prompts, system constants, risk instructions, output specifications
  • src/tools/: Tool definitions and implementations
  • src/services/: Runtime services including tools, MCP, analytics
  • src/commands/: Slash commands and the command system
  • src/components/: TUI/UI components
  • src/coordinator/: Coordinator patterns
  • src/memdir/: Memory and prompt management
  • src/plugins/ and src/utils/plugins/: Plugin ecosystem
  • src/hooks/ and src/utils/hooks.js: Hook system
  • src/bootstrap/: State initialization
  • src/tasks/: Local tasks, remote tasks, async agent tasks

This is not a simple CLI wrapper. The entry layer alone demonstrates platform thinking: four distinct entry points (CLI, initialization flow, MCP mode, SDK) allow the same agent runtime to serve multiple interaction surfaces.

The Command System as Control Panel

The command system exposes system-level commands including /mcp, /memory, /permissions, /hooks, /plugin, /reload-plugins, /skills, /tasks, /plan, /review, /status, /model, /output-style, /agents, and /sandbox-toggle. Critically, it does not merely register built-in commands—it uniformly loads plugin commands, skill commands, bundled skills, and dynamically discovered skills (filtered for availability).

Application Scenario: When you type /skills in the terminal, you see not a static list but a dynamically computed inventory of capabilities available for your current project context, installed plugins, and connected MCP servers. This makes the command system itself the unified entry point for the ecosystem.

The Tools Layer: Where Models Become Executors

The tools layer includes FileRead, FileEdit, FileWrite, Bash, Glob, Grep, TodoWrite, TaskCreate, AskUserQuestion, Skill, Agent, MCPTool, and Sleep. The essence of this layer is transforming the model from a “responder” into an “executor.” Claude Code’s stability stems significantly from this layer being formal, clear, and governable.

Author’s Reflection: Most open-source coding agents have a directory structure resembling a single main file, a prompt file, a few tool files, and utilities. Claude Code’s structure operates at a completely different magnitude—not for aesthetics, but because it solves fundamentally more complex problems.


How Does the System Prompt Work as a Runtime Assembler?

Summary: Claude Code’s system prompt is not static text but a dynamically assembled construct with static prefixes (cache-friendly) and dynamic suffixes (session-specific), governed by explicit cache boundary markers.

The most critical source file is src/constants/prompts.ts. It serves not as a repository of “magical copy” but as the master assembler for the system prompt—handling environment injection, tool usage specifications, security and risk action guidelines, session-specific guidance, language/output style, MCP instructions, memory prompts, scratchpad explanations, function result clearing hints, and feature-gated sections.

The Assembly Architecture

The getSystemPrompt() function constructs prompts through a two-part structure:

Static Prefix (Cache-Optimized):

  • getSimpleIntroSection(): Identity positioning
  • getSimpleSystemSection(): Base system specifications
  • getSimpleDoingTasksSection(): Task execution philosophy
  • getActionsSection(): Risk action specifications
  • getUsingYourToolsSection(): Tool usage specifications
  • getSimpleToneAndStyleSection(): Tone and style guidelines
  • getOutputEfficiencySection(): Output efficiency rules

Dynamic Suffix (Session-Conditional):

  • Session guidance
  • Memory
  • Ant model override
  • Environment info
  • Language
  • Output style
  • MCP instructions
  • Scratchpad
  • Function result clearing
  • Summarize tool results
  • Numeric length anchors
  • Token budget
  • Brief mode

Cache Economics as Infrastructure

The source explicitly defines SYSTEM_PROMPT_DYNAMIC_BOUNDARY with comments indicating: content before the boundary should remain cache-friendly; content after is user/session-specific and should not be arbitrarily modified to avoid breaking cache logic.

Application Scenario: Consider a development team working across multiple languages. When switching from English to Chinese, Claude Code does not reload the entire system. Instead, it keeps the static prefix unchanged (cache hit) and injects new language instructions after the dynamic boundary, preserving tool definitions and MCP instructions structure. This minimizes multi-language support costs while ensuring the model always knows which language to use.

Author’s Insight: This demonstrates “Prompt assembly with cache economics”—engineering optimization around token costs and cache hit rates. Most prompt engineering stops at “writing good text”; Claude Code treats prompts as orchestrated runtime resources with cost management built in.


How Are Model Behaviors Institutionalized?

Summary: Claude Code prevents behavioral drift not by hoping the model is “smarter” but by encoding behavioral norms into the system prompt and runtime rules.

The getSimpleDoingTasksSection() module constrains model behavior with specific rules:

Constraint Category Specific Rules
Feature Boundaries Do not add features the user did not request; avoid over-abstraction; no blind refactoring
Code Standards Do not arbitrarily add comments/docstrings/type annotations; avoid unnecessary error handling/fallback logic
Workflow Read code before modifying; do not casually create new files; avoid giving time estimates
Problem Solving Diagnose before switching strategies when methods fail; delete confirmed unused items without compatibility baggage
Result Reporting Report results truthfully; do not pretend to have tested when you haven’t

Risk Action Specifications

The getActionsSection() defines what constitutes “risk actions requiring confirmation”: destructive operations, hard-to-reverse operations, modifications to shared state, externally visible actions, and uploads to third-party tools. It emphasizes: do not use destructive actions as shortcuts; investigate unfamiliar states before acting; do not brutally delete merge conflicts or lock files.

Application Scenario: When you ask Claude Code to delete an old configuration file, it does not immediately execute rm. Instead, it checks whether the file is referenced by other processes, whether it has special meaning in git history, and whether deletion would affect team members. This “blast radius awareness” is not something the model figured out on its own—it is hardcoded in the prompt.

Tool Usage Grammar

The getUsingYourToolsSection() establishes clear tool strategies:

  • Read files using FileRead, not cat/head/tail/sed
  • Edit files using FileEdit, not sed/awk
  • Create files using FileWrite, not echo redirection
  • Search files using Glob; search content using Grep
  • Reserve Bash for truly shell-dependent scenarios
  • Use TodoWrite/TaskCreate when task management tools are available
  • Execute tool calls without dependencies in parallel

Author’s Reflection: Many coding agents are unstable not because they cannot write code, but because they use tools incorrectly—such as using bash sed to modify code, where a single regex error causes collapse. Claude Code’s stability is deeply tied to this “tool usage grammar.”


What Is the Agent Specialization Strategy?

Summary: Claude Code employs multiple built-in agents with clear role separation rather than a single “universal worker,” addressing the fundamental problem that one agent cannot simultaneously research, plan, implement, and verify effectively.

The source confirms at least six built-in agents:

Agent Type Role Key Constraints
General Purpose Agent General task handling No special restrictions
Explore Agent Read-only code exploration Absolutely read-only: cannot create, modify, delete, or move files; Bash limited to ls, git status, git log, git diff, find, grep, cat, head, tail
Plan Agent Pure planning, no editing Read-only; outputs step-by-step implementation plan; must list Critical Files for Implementation
Verification Agent Adversarial validation Goal is “try to break it”; must run build, test suite, linter/type-check; each check requires command and observed output; final verdict must be VERDICT: PASS/FAIL/PARTIAL
Claude Code Guide Agent Product usage guidance Helps users understand Claude Code features
Statusline Setup Agent Status bar configuration Handles statusline-related settings

The Explore Agent: Read-Only Specialist

The Explore Agent’s system prompt explicitly prohibits file creation, modification, deletion, or movement. It cannot write temporary files or use redirection/heredoc. Its Bash usage is strictly limited to read operations. The design intentionally trims it into a read-only specialist.

Application Scenario: When exploring a large codebase to understand architecture patterns, the Explore Agent can traverse files without risk of accidentally modifying something. This isolation prevents exploration-phase accidents from affecting subsequent implementation phases.

The Plan Agent: Architect, Not Executor

The Plan Agent is defined as architect/planner, not executor. It must understand requirements, explore codebase patterns and architecture, output step-by-step implementation plans, and list critical files for implementation. This separation of planning and implementation reduces role confusion.

The Verification Agent: The Most Valuable Design

The Verification Agent’s prompt explicitly identifies two failure modes:

  1. Verification avoidance: Looking only at code without running checks, writing PASS and leaving
  2. Being fooled by the first 80%: UI looks fine, tests pass, ignoring the remaining 20% of issues

Mandatory requirements include:

  • Run build, test suite, linter/type-check
  • Frontend changes: browser automation or page sub-resource verification
  • Backend changes: curl/fetch actual response testing
  • CLI: examine stdout/stderr/exit code
  • Migrations: test up/down and existing data
  • Refactors: test public API surface
  • Adversarial probes required
  • Each check must include command and output observed
  • Final output must be VERDICT: PASS/FAIL/PARTIAL

Author’s Insight: In traditional software engineering, separating implementers from validators is common sense. In AI agent systems, most products have not reached this step. Claude Code makes the Verification Agent independent, giving it no vested interest in the implementing agent’s success. The implementing agent may倾向于 believe their code is correct; the verification agent’s job is to find problems.


How Does the Agent Dispatch Chain Work?

Summary: A sub-agent moves through a 14-step pipeline from trigger to completion, with distinct Fork and Normal paths optimized for cache reuse and role clarity.

The complete dispatch chain abstracts to:

  1. Main model decides to call Agent tool
  2. AgentTool.call() parses input
  3. Parse whether teammate/fork/built-in/background/worktree/remote
  4. Select agent definition
  5. Construct prompt messages
  6. Construct/inherit system prompt
  7. Assemble tool pool
  8. Create agent-specific ToolUseContext
  9. Register hooks/skills/MCP servers
  10. Call runAgent()
  11. runAgent() internally calls query()
  12. query produces message stream
  13. runAgent records transcript, handles lifecycle, cleans resources
  14. AgentTool aggregates results or async task notification

Fork Path vs Normal Path

Dimension Fork Path Normal Path
Trigger subagent_type omitted and fork feature enabled Explicitly specify built-in/custom agent type
System Prompt Inherits main thread Generated from agentDefinition
Context Full parent thread inheritance Only required context for that agent
Tool Set Consistent with parent (cache hit optimization) Follows agent’s tool restrictions
Design Goal Reuse main thread cache, avoid burning tokens Strict isolation, clear roles

Application Scenario: When you fork a sub-task for time-consuming code analysis (like having Explore Agent traverse an entire codebase), you can close the terminal and work on something else. When complete, Claude Code notifies the main thread, allowing you to choose between summary or full output. This seamless foreground/background switching comes from complete agent lifecycle management.

Author’s Insight: The Fork path is not “just another agent”—it is an execution path specifically optimized for cache and context inheritance. Comments explicitly mention maintaining “byte-identical API request prefix” to improve prompt cache hit rates. Most people think “sub-tasks should run”; Claude Code thinks “sub-tasks should run AND reuse main thread cache without wasting tokens.”


How Do Skills, Plugins, and MCP Extend the System?

Summary: The ecosystem’s key is making the model “aware” of its capabilities through skills lists, agent lists, MCP instructions, session-specific guidance, and command integration.

Skill: Workflow Package

A Skill is not documentation but a markdown prompt bundle with frontmatter metadata:

  • Declares allowed-tools
  • Injects into current context on demand
  • Compresses repetitive workflows into reusable capability packages

The system requires: when a task matches a skill, you must call the Skill tool to execute it—you cannot merely mention the skill without execution.

Plugin: Prompt + Metadata + Runtime Constraints

Plugins provide:

  • Markdown commands
  • SKILL.md skill directories
  • commandsMetadata
  • userConfig (sensitive values stored in system keychain, not on disk)
  • Shell frontmatter
  • allowed-tools
  • Model/effort hints
  • Runtime variable replacement (supporting ${CLAUDE_PLUGIN_ROOT}, ${CLAUDE_SESSION_ID}, etc.)

MCP: More Than a Tool Bridge

When an MCP server connects, if the server provides instructions, those instructions are assembled into the system prompt. This means MCP simultaneously injects:

  1. New tools
  2. Instructions on how to use those tools

Application Scenario: Many platforms have plugin systems and tool markets, but the model itself does not know “what extensions exist, when to use them, or how to use them.” Claude Code makes the model “aware of its extended capabilities” through skills lists, agent lists, MCP instructions, session-specific guidance, and command integration. This is the prerequisite for an ecosystem to truly function—giving someone a professional toolbox is useless if they don’t know what’s inside or when to open it.


How Is Context Managed as a Scarce Resource?

Summary: Claude Code treats context as a budget to be managed, not free air, through static/dynamic boundaries, fork path cache optimization, on-demand skill injection, and compression/resume mechanisms.

Mechanism Purpose Scenario Value
System prompt static/dynamic boundary Cache static portions, inject dynamic portions on demand Reduce redundant computation costs
Fork path cache-identical prefix Sub-tasks reuse main thread cache Parallel complex tasks without wasting tokens
Skill on-demand injection Not all loaded upfront Keep initial context clean
MCP instructions by connection status Unconnected server instructions don’t occupy space Avoid invalid information interference
Function result clearing Actively clean completed function results Free context space
Summarize tool results Automatic summarization of long results Prevent single output from filling window
Compact/transcript/resume Context compression and session recovery Long sessions without interruption

Author’s Insight: For those building demos, context management is irrelevant—demos run a few times and end. For product builders, context economics directly impacts costs and experience. Processing tens of thousands of requests daily, each with thousands of tokens in system prompts, a 10% improvement in cache hit rates can save enough in a month to hire another person. These designs are not “over-optimization” but necessities at scale.


What Product Engineering Details Create the “Stable” Experience?

Summary: RunAgent includes product-level details like transcript recording, performance tracking, resource cleanup, and foreground/background lifecycle management that distinguish a prototype from a production product.

The runAgent() function contains numerous product-level details:

  • recordSidechainTranscript(): Records sidechain conversations
  • writeAgentMetadata(): Writes agent metadata
  • registerPerfettoAgent(): Performance tracking registration
  • cleanupAgentTracking(): Cleans tracking state
  • killShellTasksForAgent(): Terminates the agent’s shell tasks
  • Cleanup of session hooks, cloned file state, todos entries

Background agents have independent abort controllers, can run continuously in the background, return to the main thread via notification when complete, and support automatic summarization. Foreground agents can be converted to background during execution, with progress tracking.

Permission and Security Architecture

Claude Code does not simply tell the model to “be careful” but enforces security through multi-layer mechanisms:

  • Sandbox isolation: Tools run in controlled environments
  • Layered permissions: Hook → Policy → User approval decision layers
  • Human-in-the-loop: Critical operations require user confirmation
  • PreToolUse Hook: More comprehensive than most AI tools’ total security infrastructure

Application Scenario: In a financial project using Claude Code, you can implement through Hooks:

  • All database write operations require secondary confirmation
  • All external API calls must be logged for audit
  • All delete operations must first backup to designated directories

These policies require no core code modifications—just Hook configuration.


Practical Action Checklist

For technical leaders evaluating AI coding tools:

✅ Ideal Scenarios for Claude Code:

  • Complex refactoring requiring coordination across multiple files
  • Teams needing automated CI/CD integration
  • Environments with explicit security compliance requirements (needing permission layering)
  • Background analysis tasks requiring long runtimes
  • Organizations wanting to connect internal toolchains via MCP

⚠️ Scenarios Requiring Careful Evaluation:

  • Only needing IDE real-time completion (Copilot/Cursor more suitable)
  • Fully offline air-gapped environments (Claude Code requires cloud connectivity)
  • Teams sensitive to token consumption (Agent Teams use significant volume)

🔧 Quick Start Recommendations:

  1. Create CLAUDE.md in project root to document project conventions
  2. Connect team tools via /mcp (Jira, Slack, Notion, etc.)
  3. Use /plan to have Plan Agent output implementation plans before execution
  4. Enable Verification Agent for adversarial validation on critical tasks
  5. Configure team-specific security policies through Hooks (e.g., database write confirmation)

One-Page Overview

Claude Code Core Architecture
├── Entry Layer: CLI / MCP / SDK / Initialization
├── Prompt Architecture: Static prefix + Dynamic boundary (cache optimized)
├── Tool Governance: Input validation → Hook interception → Permission decision → Execution → PostHook
├── Multi-Agent Division: Explore (read-only) / Plan (planning) / Verification (validation)
├── Dispatch Chain: 14-step Pipeline, Fork reuses main thread cache
├── Extension Ecosystem: Skill (workflow package) / Plugin (behavior extension) / MCP (tools + instructions)
├── Context Economics: Static/dynamic separation, on-demand injection, compression/recovery
└── Lifecycle: Transcript recording, performance tracking, resource cleanup, foreground/background switching

Design Philosophy: Do not leave "good behavior" to model improvisation—write it into the system
Competitive Moat: Not any single prompt, but the complete Agent Operating System

Frequently Asked Questions

Q1: What is the difference between Claude Code and GitHub Copilot?
Copilot is an IDE real-time completion tool; Claude Code is a terminal-native agent handling task-level delegation (cross-file refactoring, testing, Git workflows). They can complement each other.

Q2: Why does Claude Code need so many “agents”? Can’t one agent do everything?
A single agent simultaneously handling research, planning, implementation, and validation leads to role confusion and bias. Claude Code improves task stability through specialized division of labor.

Q3: What is MCP and why does it matter?
MCP (Model Context Protocol) is an open protocol allowing Claude Code to connect external data sources and tools. Crucially, MCP provides not just tools but instructions on how to use them, making the model truly aware of extended capabilities.

Q4: What’s the difference between forking a sub-task and creating a new agent?
Fork inherits the main thread’s system prompt and full context, keeping tool sets consistent to reuse prompt cache and reduce token costs. Suitable for research sub-tasks.

Q5: How does Claude Code’s “memory” function work?
Through the src/memdir/ module, implementing cross-session memory including user preferences, project constraints, and team conventions. Memory is stored as Markdown files, injected on demand—not loaded in full every session.

Q6: What can the Hook system do?
Hooks can intercept calls before tool execution, rewrite inputs, adjust permission behavior, or append messages and supplement context after execution. Suitable for implementing team-specific security policies or audit requirements.

Q7: Why does Claude Code emphasize “read code before modifying code”?
This is an explicit rule in getSimpleDoingTasksSection(), designed to prevent models from blindly modifying without understanding context, reducing refactoring errors and unnecessary abstraction.

Q8: What does Verification Agent’s “try to break it” mean?
The Verification Agent’s prompt explicitly requires actively seeking code defects (adversarial probes) rather than simply confirming “looks fine.” It must run build, test, linter, and output explicit VERDICT: PASS/FAIL/PARTIAL.