Mastering Claude’s Intelligence: 3 Core Patterns for Building Resilient Applications

The most effective strategy for building applications with Claude is not to patch its perceived weaknesses with complex agent frameworks, but to leverage its natively evolved capabilities using the simplest possible tool combinations. As Anthropic’s co-founder Chris Olah once observed, generative AI systems like Claude are less “manufactured” and more “cultivated.” Researchers set the conditions for growth, but the exact structures and capabilities that ultimately emerge are largely unpredictable.
This fundamental nature presents a significant challenge for developers. For a long time, the industry standard has been to wrap models in AI shells or agent harnesses—external code structures designed to control and compensate for what the model supposedly cannot do. However, these frameworks are built on fragile assumptions. As Claude’s intelligence evolves, these assumptions quickly become obsolete, often turning into constraints that actively hold the model back.
This article explores three core patterns for constructing Claude-based applications. By adopting these paradigms, you can ensure your applications keep pace with Claude’s evolutionary trajectory while strictly maintaining low latency and system costs.
Code and Intelligence
Image Source: Unsplash

Pattern 1: How Can We Maximize the Skills Claude Already Knows?

Summary: Abandon the pursuit of flashy, bespoke tools. Return to the foundational tools Claude knows best—like Bash and text editors—and allow the model’s inherent combinatorial logic to solve complex problems.
Core Question: Why do the simplest tools often yield the most powerful results when building Claude applications?
We strongly recommend building your applications using the tools Claude is already intimately familiar with. Cast your mind back to late 2024: Claude 3.5 Sonnet achieved a score of 49% on SWE-bench Verified, an authoritative benchmark evaluating an AI’s ability to resolve real-world software engineering problems. This set a new record at the time.
What is truly astonishing is the underlying architecture that achieved this. It relied solely on a Bash tool (which allows the AI to execute instructions via a computer command line) and a text editor tool (for viewing, creating, and modifying files). Anthropic’s official coding assistant, Claude Code, is built on these exact same primitive tools.
These basic tools were never originally designed for building AI agents, yet they are the instruments Claude understands most profoundly. Over time, Claude’s mastery over them only grows more refined. We have found that Claude can combine these generic tools into incredibly rich patterns to solve highly complex problems. For instance, programmatic tool calling, skill management, and memory tools are essentially derivatives of combining basic Bash and text editor tools.
Claude's SWE-bench Evolution
The progression of Claude model scores on the SWE-bench Verified benchmark visually demonstrates its evolutionary path. Developers do not need to custom-build complex “problem solvers” for it; simply providing the basic tools allows the model to derive its own solution paths.
Derivatives of Basic Tools
Programmatic tool calling, skills, and memory tools are, in reality, elegant combinations of Bash and text editing tools. This emergent capability—moving from simplicity to complexity—is the ultimate manifestation of the “cultivation” philosophy.

Reflection & Insight: In practical engineering, we frequently fall into “tool anxiety,” assuming that Claude’s underperformance stems from a lack of specialized tools. However, looking at the SWE-bench data made me realize that we often underestimate the boundaries of basic tools. Instead of spending massive amounts of time developing and maintaining complex, customized tool interfaces, we are better off focusing our energy on optimizing the contextual environment, letting the model use the “knife” it is most comfortable with to chop the vegetables.

Pattern 2: What Can We Stop Doing in Application Development?

Summary: Cease micromanaging at the outer framework layer. Return the control of process orchestration, context management, and memory persistence directly to Claude itself.
Core Question: As Claude becomes increasingly powerful, which over-protective logic should developers delete from their agent frameworks?
We have historically underestimated Claude’s capabilities, assuming it was incapable of handling various tasks. As the model’s capacity takes massive leaps forward, it is time to rigorously re-examine these outdated assumptions.

Let Claude Orchestrate Actions Autonomously: Reject Forced Feeding by Outer Frameworks

A common, mistaken assumption is that every result returned by a tool call must be immediately stuffed back into Claude’s context window so it can decide what to do next.
Imagine a real-world data processing scenario: to analyze a single column in a massive data table, you feed the entire table into the model. The result is a bloated context window, and you end up paying exorbitant token costs for rows of data Claude absolutely does not need. While you could add parameters to the tool development to limit the returned rows, this is merely treating the symptoms. The core problem is that the outer agent framework is making orchestration decisions on behalf of the model, when in reality, Claude itself is the best entity to make that decision.
Tool Execution Environment
Claude calls a tool, and the tool subsequently executes in a specific environment. Converting all tool results into tokens for the model to process is not only slow and expensive but often completely unnecessary.
As long as you provide Claude with a code execution tool (like a REPL or Jupyter Notebook), the dilemma is instantly resolved. This allows Claude to write its own code to execute tool calls and personally handle the data flow logic between these tools. Instead of the framework forcibly feeding all results into the context, Claude decides which results to skip entirely, which to filter, or which to pipe directly into the next call. The precious context window remains entirely clean; only the streamlined final results of the code execution truly enter Claude’s field of vision.
Code Controlling Tool Calls
Claude can personally write code to control tool invocation and chain the logic between them. In this process, the baton of process orchestration is handed back from the outer framework to the Large Language Model (LLM) itself.
In BrowseComp (a benchmark testing an agent’s ability to browse the web), when we granted the Opus model the ability to filter its own tool outputs, its accuracy surged directly from 45.3% to 61.6%. This proves that a powerful coding model is naturally a powerful general-purpose AI agent.
Data Analysis Scenario
Image Source: Unsplash

Let Claude Manage Its Own Context: Goodbye to Force-Fed Prompts

Contextual prompts tailored to specific tasks can guide Claude to better use foundational tools. In the past, we assumed we had to hand-carve extremely detailed prompts for every single task. However, this “duck-stuffing” approach of pre-loading instructions fails completely when faced with a massive volume of tasks. Every token you add ruthlessly consumes the model’s attention (akin to the model’s brainpower; stuffing in too much useless information causes it to miss the main point). Pre-loading instructions that are used once in a blue moon is a pure waste of resources.
Giving Claude access to a skill library perfectly cracks this deadlock. Now, you only need to pre-load the short YAML header information (similar to a table of contents summary) of each skill into the context window, letting Claude know what skills are available. If the current task requires it, Claude will actively invoke the file-reading tool to “progressively unfold” the full skill content (loading on demand).
On-Demand Skill Unfolding
Claude leverages the skill library to progressively unfold relevant contexts based on task requirements. If the skill library gives Claude the ability to freely assemble context, context pruning is the opposite art. It provides a mechanism allowing the model to selectively delete outdated or irrelevant nonsense, such as stale tool execution results or its own early thought drafts.
Furthermore, with the assistance of sub-agents, Claude increasingly understands when it should “start from scratch”—opening a brand-new, clean context window to isolate and focus on a specific task. This trick of summoning sub-agents improved its score on the BrowseComp test by an additional 2.8% over the best single-agent approach.

Let Claude Persist Its Own Context: Replacing Complex External RAG

For agents required to run over long periods, it is easy to exhaust the capacity of a single context window. The industry generally assumes that at this point, you must build a complex retrieval architecture (like RAG) outside the model to act as a memory system. However, extensive research shows a smarter approach: give Claude simple tools and let it decide for itself what content is worth saving.
Context compression technology allows Claude to condense and summarize its past context, ensuring it does not “lose the plot” during marathon, long-cycle tasks. Through several model iterations, Claude’s eye for selecting what to remember has become increasingly accurate. In autonomous exploration search tasks, no matter how much compression budget we gave earlier models, the accuracy stubbornly stalled at 43%. Under the exact same configuration, a later powerful model jumped to 68%, and the newest version skyrocketed to 84%.
Introducing a memory folder is another clever trick. It allows Claude to save important context as files, like taking notes, and read them when needed. We witnessed its power in agent search tasks: in the BrowseComp-Plus test, simply giving the model a memory folder significantly boosted its accuracy.
Memory Persistence
Claude can persist important context into a memory folder for safekeeping.
Real-World Case Study: The Revelations of Letting Claude Play Pokémon
Having an AI play a long-cycle game like Pokémon is an excellent case study demonstrating the leap in Claude’s ability to utilize memory folders.
The early Claude 3.5 Sonnet treated memory like meeting minutes, foolishly recording the nonsense spoken by NPCs in the game, completely missing the point. After 14,000 steps, it generated 31 messy files—including two nearly identical, boring notes about caterpillar Pokémon—and its game progress tragically stalled at the second town:

// Inefficient memory example from an early model
caterpie_weedle_info (Caterpie and Weedle Info): 
- Caterpie and Weedle are both caterpillar-shaped Pokémon.
- Caterpie is non-poisonous.
- Weedle is poisonous.
- This information is crucial for future encounters and battles.
- If our Pokémon gets poisoned, we need to go to the Pokémon Center for treatment immediately.

When we deployed a later, more powerful model, the vibe completely shifted; it started recording “hardcore tactical notes.” In the exact same 14,000 steps, the new model generated only 10 well-organized, categorized files. It not only swept through to earn 3 gym badges but even extracted a dedicated “lessons learned” document from the pitfalls it encountered:

// Efficient tactical memory example from a powerful model
/gameplay/learnings.md (Gameplay/Lessons Learned): 
- Bellsprout's Sleep Powder + Wrap combo: Must quickly knock it out with "Bite" before its Sleep Powder hits. Absolutely cannot let it set up!
- Gen 1 Bag Limit: Can only hold 20 items max. Make sure to throw away useless TMs before entering a maze.
- Spinning Tile Maze: Different entrance y-coordinates lead to completely different endpoints. Try all entrances and navigate through multiple pocket spaces.
- B1F y=16 wall confirmed as solid, range covers all x=9-28 (Recorded at step 14557)

Memory and Brain
Image Source: Unsplash

Reflection & Insight: This Pokémon case study deeply shocked me. The leap from “recording nonsense” to “extracting tactical rules” is not merely an improvement in memory capability, but an evolution in the model’s cognitive dimension. We used to think about externally “feeding” information using vector databases, but now it seems the most efficient memory system is actually the model’s inherent “information noise reduction and abstraction capability.” Give the model a simple folder, and it will return a tactical encyclopedia.

Pattern 3: How Do We Carefully Set Boundaries for Claude Frameworks?

Summary: The outer framework exists to guarantee safety, control costs, and optimize the user experience, but this must be achieved through meticulous context engineering and declarative tool design, avoiding crude interventions.
Core Question: While granting Claude immense autonomy, how do we use framework design to hold the safety line and optimize system performance?
The outer agent framework acts as a straitjacket for Claude, usually designed to guarantee user experience (UX), control spending, or hold the safety bottom line.

Master Context Engineering to Maximize Cache Hit Rates

Claude is stateless. This means Claude inherently has a “goldfish memory”; it cannot see previous conversation history. Therefore, in every single turn of conversation, the outer framework must act like a dutiful courier, packing the new context along with all previous action records, tool descriptions, and instructions given to Claude, resending them all together.
To avoid repetitive labor and high costs, we can cache prompts by setting breakpoints. The Claude API writes all context content before the breakpoint into a cache, and when the next request arrives, it checks whether the current content matches the previous cache. You should know that the token price for a cache hit is only one-tenth of the normal price!
To help your application save both money and time, here are several golden rules for maximizing cache hit rates in an agent framework:

Principle Specific Description & Operational Guidelines
Static Content First, Dynamic Content Last When arranging the request order, you must place stable content (like system prompts, tool lists) at the very front.
Use Messages to Update If you need to remind the model, append a <system-reminder> at the end of the message. Never modify the original prompt (modifying the front causes the entire cache to invalidate).
Don’t Switch Models Mid-Session Avoid switching between different models in the same conversation session. Cache is tied to a specific model; once switched, the cache is entirely voided. If you need a cheaper model for simple tasks, use a sub-agent.
Manage Tools Cautiously The tool list is located at the head of the cache. Adding or deleting any single tool will invalidate it. For scenarios requiring dynamic tool discovery, use the tool search function, which only appends content backward without destroying the cache.
Update Breakpoints Timely For multi-turn conversational apps (like agents), move the breakpoint to the latest message to keep the cache fresh. It is highly recommended to use auto-caching to handle this.
Server and Cost Optimization
Image Source: Unsplash

Use Declarative Tools to Build UX and Safety Boundaries

Claude does not inherently know where your application’s “safety bottom line” is, nor does it know what your product interface looks like. It just buriedly issues “call tool” instructions, leaving all the dirty work to the outer framework.
While the Bash tool gives Claude superpowers to turn the coding world upside down, to the outer framework, what the Bash tool throws out is just a string of dry command-line characters—no matter what dangerous action Claude executes, the format is exactly the same. This makes it incredibly difficult for the framework to exercise precise control.
At this point, elevating certain key actions into dedicated declarative tools becomes particularly important. This gives the outer framework a specific “handle” with clear parameter types, allowing it to effortlessly perform interception reviews, set permission gates, render frontend interfaces, or conduct security audits.
Declarative Tool Boundaries
Actions that easily touch safety red lines are naturally suited to be made into dedicated tools.
How to Judge Which Actions Need to Become Declarative Tools?
“Is it reversible?” is an excellent judging standard.

  • Irreversible Operations: For operations like calling external APIs where spilled water cannot be gathered, you can set a threshold forcing “user confirmation” before letting it pass.
  • Overwriting Operations: For write tools like file editing, you can build in a “file staleness check” mechanism to prevent Claude from blindly overwriting a file someone else just modified.
  • Interactive Operations: When an action needs to be intuitively displayed to the end-user, a dedicated tool can be rendered as a frontend modal, clearly asking the user questions, throwing out multiple options, or simply pausing the agent’s run loop to wait for user feedback.
  • Troubleshooting Operations: Dedicated tools are highly beneficial for system observability. When an action is a rigorously formatted tool, the outer framework can obtain structured parameter data, making subsequent log recording, link tracing, and scenario replay incredibly easy.
    Advanced Pattern: Fighting Magic with Magic
    “Whether to make this action a dedicated tool” is by no means a one-time decision; it requires continuous re-evaluation. Take Claude Code’s Bash review mode as an example. It puts an incredibly hardcore safety boundary around the Bash tool: it summons a “second Claude” to read the command-line string, letting AI audit whether the AI’s operation is safe.
    This pattern actually reduces reliance on traditional dedicated tools. However, this approach is only suitable for tasks where the user trusts the general direction enough. For high-risk operations where one wrong move loses everything, honestly writing a dedicated tool still holds an unshakable status.
    Security and Lock
    Image Source: Unsplash

Reflection & Insight: The design of security architectures often swings between “flexibility” and “control.” In the past, I tended to do massive regex matching at the framework layer to intercept dangerous commands, resulting in astronomically high false-positive rates. The design philosophy of declarative tools is essentially a “whitelist” mindset, while the Bash mode of “AI auditing AI” shows me the embryonic form of dynamic security auditing. Future security boundaries might no longer be hardcoded rules, but another model capable of understanding.

Reflection and Outlook: Always Be Ready to Chop Off Historical Baggage

Core Question: Why does the code we wrote in the past to patch model flaws become the culprit dragging down performance today?
The boundaries of Claude’s intelligence continue to sprint and pioneer. Every time it completes a leap in capability, our past prejudices about what it “cannot do” must be overthrown and re-verified.
We have witnessed history repeat itself time and time again. In long-cycle agent tests, early Claude models would experience “panic” the moment they sensed the context was about to fill up, hastily wrapping up the task. To alleviate this “context anxiety,” we hardcoded “reset” logic in the code to forcefully clean the context window.
When it came to the newer, powerful model, this flaw miraculously cured itself! Thus, the “context reset” code we painstakingly wrote back in the day instantly became useless dead weight in the agent framework.
Chopping off this baggage ruthlessly is extremely important because it often inversely becomes the bottleneck restricting Claude from exerting its strength. This also verifies the famous “Bitter Lesson” in the AI field: over-reliance on hand-crafted rules based on human experience will ultimately drag down the capabilities evolved by the model itself through compute.
In the long years to come, we should tirelessly examine the structures and boundaries in our applications, repeatedly torturing ourselves with that soul-searching question: “What can I stop doing this time?”

One-Page Summary

The core philosophy of building Claude applications has shifted from “framework control” to “model autonomy.” Developers should avoid using bloated external frameworks to restrict the model, instead providing basic tools (Bash, text editors) and letting the model orchestrate its own workflows by writing code. For context management, completely abandon “duck-stuffing” prompts, switching to on-demand loading via skill libraries, context pruning, and sub-agent isolation. For memory management, replace complex external RAG architectures with the model’s own context compression and file-based memory. Finally, when setting boundaries, utilize caching mechanisms to reduce costs by 90%, and use declarative tools to precisely control safety reviews for irreversible operations, keeping the framework lightweight and restrained.

Practical Summary / Action Checklist

  • [ ] Simplify the Toolbox: Remove complex customized tools; ensure the underlying application relies solely on Bash and a text editor.
  • [ ] Delegate Orchestration Rights: Delete logic in the framework that forcibly injects all tool results into the context. Switch to providing a code execution environment (like a REPL), letting Claude filter and pass data autonomously.
  • [ ] Restructure Prompt Architecture: Dismantle overly long system prompts. Build a skill library, loading only YAML summaries in the context header.
  • [ ] Enable Self-Management Mechanisms: Equip Claude with context pruning tools and sub-agent invocation capabilities for long-cycle tasks.
  • [ ] Build an Internal Memory System: Abandon automated external RAG pipelines. Provide a local “memory folder” and context compression tools, letting the model autonomously decide what to store.
  • [ ] Reorder API Requests: Check the request body to ensure absolutely static content (system prompts, tool definitions) is at the front, and dynamic conversation history is at the back.
  • [ ] Lock the API Cache: Prohibit switching models within the same session. Use <system-reminder> to append instructions. Turn on auto-caching.
  • [ ] Audit Safety Boundaries: Review all operations executed via Bash. Extract irreversible actions involving external API calls or file overwrites into declarative tools requiring user confirmation.

Frequently Asked Questions (FAQ)

Q1: Why can Claude achieve high scores on code benchmarks using only Bash and a text editor?
Because these two tools are the foundational infrastructure Claude is most familiar with from its training data. It can combine these two simple tools into incredibly complex processing logic by writing scripts. This general programming capability is far more extensible than customizing dedicated “problem-solving tools” for it.
Q2: If the framework doesn’t stuff tool results into the context, how does Claude know what to do next?
By giving Claude a code execution tool (like Jupyter), it can write its own code to receive the raw data returned by tools. Claude handles data filtering, extraction, and passing at the code level, outputting only the final streamlined calculation results into the context for the next decision.
Q3: How does a skill library save context space?
A skill library does not load all instructions at startup. It only loads the YAML header containing the skill name and a short description. When Claude determines a task requires a certain skill, it actively calls the file-reading tool to “unfold” that skill’s complete instructions into the context on demand.
Q4: Why use a memory folder instead of a traditional RAG system for long-cycle tasks?
Traditional RAG systems usually rely on external vectorized retrieval, which easily introduces noise and lacks an understanding of the task’s overall context. A memory folder lets Claude proactively save valuable contexts as structured files, much like a human taking notes. As the model evolves, its ability to extract “high-value tactical notes” far surpasses passive vector retrieval.
Q5: Why does switching models mid-conversation cause a cost explosion?
Claude API’s caching mechanism is strictly bound to a specific model version. If you switch from a powerful model to a weaker model mid-conversation to save money, the API considers the request structure to have fundamentally changed, causing all previous context caches to be completely invalidated. The next request must be recalculated at the full token price.
Q6: What types of operations must be extracted from Bash into declarative tools?
The judging criteria are “is it reversible” and “does it require user intervention.” For example, calling an external paid API (irreversible, requires a confirmation gate), overwriting an existing file (might destroy others’ modifications, requires a staleness check), or collecting info from users (requires rendering a frontend modal)—these should not be executed as simple Bash commands.
Q7: What is “context anxiety,” and why did it not need to be cured later?
Early Claude models would exhibit abnormal conservatism when they found the context window filling up, tending to immediately stop tasks or abandon complex operations. Developers were once forced to write code at the framework layer to forcefully reset the context. But with the evolution of model reasoning capabilities, new models can calmly handle a fully loaded context. The past “treatment code” actually became dead weight interfering with the model’s normal performance.