Thinking Like an Agent: Lessons in Tool Design from Building Claude Code
As artificial intelligence continues to evolve at a breakneck pace, one of the most significant challenges facing developers today is how to design the “action space” for AI agents—the collection of tools and capabilities they can use to accomplish tasks. The team at Anthropic, particularly Thariq, has accumulated a wealth of practical experience while building the widely-discussed Claude Code. This article will guide you through the process of learning to “think like an agent” through careful observation, experimentation, and iteration, ultimately helping you design more effective and natural AI interaction tools.
The Core Challenge: How Do You Build Tools for an Agent?
When constructing an agent within the Claude API, developers face a multitude of options for tool creation. Tools can be built using fundamental building blocks such as code execution, bash commands, skills, and more. Faced with these choices, a critical question emerges: How exactly should you design your agent’s toolset?
Should you create just one all-powerful tool, like code execution or bash commands? Or should you prepare fifty specialized tools to cover every possible scenario your agent might encounter?
To understand this problem, we need to practice perspective-taking. Imagine someone gives you a difficult math problem. What tools would you want to solve it? The answer depends entirely on your own skill level:
- •
Pen and paper represent the most basic option, but you’d be limited by your manual calculation abilities - •
A calculator would be better, but you’d need to know how to operate its advanced functions - •
A computer would be the most powerful and fastest option, but you’d need to know how to use it to write and execute code
This analogy provides a useful framework for thinking about agent tool design. The key insight: the tools you give an agent must match its inherent capabilities. But here’s the problem—how do we know what capabilities an agent possesses? The only way is through careful observation, reading its outputs, and continuous experimentation. We need to learn to “think like an agent.”
Lesson One: Optimizing Question-Asking Ability—The Evolution of the AskUserQuestion Tool
During the development of Claude Code, the team initially aimed to improve Claude’s ability to ask questions, a process often referred to as “elicitation.” While Claude could technically ask questions in plain text, the team noticed that users felt answering these questions took an unnecessary amount of time. How could they reduce this friction and improve the efficiency of communication between user and Claude?
Attempt 1: Modifying the ExitPlanTool
The first attempt involved adding a parameter to the ExitPlanTool that would allow it to include an array of questions. This was the easiest solution to implement, but it quickly ran into problems: Claude needed to simultaneously submit a plan and a set of questions about that plan. What if the user’s answers conflicted with what the plan said? Would Claude need to call the ExitPlanTool twice? This approach created confusion, and the team needed to explore other options.
Attempt 2: Changing the Output Format
Next, the team tried modifying Claude’s output instructions to use a specific markup format for asking questions. For example, they instructed it to output bullet-point lists with options in brackets, which could then be parsed and presented as questions to the user.
While this was the most flexible approach, and Claude seemed reasonably capable of following the format, the problem was inconsistent output. Claude would frequently add extra sentences, omit options, or use an entirely different format.
Attempt 3: Creating a Dedicated AskUserQuestion Tool
Ultimately, the team created a specialized tool—AskUserQuestion. Claude could call this tool at any time, but was specifically prompted to do so during planning mode. When triggered, the system would display a modal window showing the questions and pause the agent’s loop until the user responded.
This solution brought multiple benefits:
-
Structured output: The prompt could guide Claude to generate structured content -
Multiple options guaranteed: Ensured Claude always provided users with choices -
Functional composability: Users could call this function within the Agent SDK or reference it in skills
Most importantly, Claude seemed to enjoy calling this tool, and its outputs worked well. This reveals a crucial principle: even the best-designed tool is useless if Claude doesn’t understand how to call it.
Is this the final form of the elicitation function in Claude Code? The team isn’t sure. As subsequent cases will show, what works for one model may not be optimal for another.
Lesson Two: Adapting to Evolving Capabilities—From Todo Lists to Task Management
When Claude Code was first launched, the team realized the model needed a todo list to stay on track. Tasks could be written at the beginning and checked off as work progressed. To enable this, they equipped Claude with the TodoWrite tool for writing and updating todo items and displaying them to users.
However, even with this tool, Claude often forgot what it needed to do. To address this, the team inserted system reminders every five turns, reminding Claude of its objectives.
But as model capabilities improved, something interesting happened. Newer models not only no longer needed todo list reminders but actually found this mechanism constraining. Receiving periodic todo list reminders made Claude think it needed to strictly follow the list rather than adapt flexibly. Additionally, the team observed that Opus 4.5 showed significantly improved ability to use sub-agents—but how could sub-agents coordinate work on a shared todo list?
Based on these observations, the team replaced TodoWrite with the Task Tool. If todo lists were about keeping the model on track, tasks were more focused on helping agents communicate with each other. Tasks could include dependencies, share updates across sub-agents, and models could modify and delete them autonomously.
This evolution reveals an important insight: as model capabilities increase, previously necessary tools may become constraining. Developers need to continuously revisit assumptions about tool requirements. This is also why it’s advisable to focus on supporting a small set of models with relatively similar capability profiles.
Lesson Three: Designing Search Interfaces—Letting Agents Build Their Own Context
A particularly important set of tools for Claude are search tools, which enable it to build its own context environment.
When Claude Code first launched, the team used a RAG vector database to find context for Claude. While RAG was powerful and fast, it required indexing and setup, and could be unstable across different environments. More importantly, Claude was passively receiving context rather than actively seeking it out.
But if Claude could search the web, why not let it search the codebase? By equipping Claude with a Grep tool, the team enabled it to search files and build context on its own.
This reveals a pattern: as Claude becomes smarter, if given the right tools, its ability to build its own context grows increasingly sophisticated.
When the team introduced Agent Skills, they formalized the concept of “progressive disclosure.” This concept allows agents to incrementally discover relevant context through exploration. Claude could read skill files, and those files could reference other files that the model could read recursively. In fact, a common use of skills is to add more search capabilities to Claude, such as instructions on using APIs or querying databases.
Over the course of a year, Claude evolved from being largely unable to build its own context to being capable of nested searches across multiple layers of files to find precisely the context it needed. Progressive disclosure has now become a common technique for adding new functionality without adding new tools.
Lesson Four: Progressive Disclosure in Action—The Claude Code Guide Agent
Currently, Claude Code has approximately 20 tools, and the team continuously questions whether all of them are truly necessary. The bar for adding new tools is high because each new tool represents one more option the model must consider.
For example, the team noticed that Claude didn’t know enough about how to use Claude Code itself. If you asked it how to add MCP (Model Context Protocol) or what a specific slash command did, it couldn’t answer.
One solution would be to put all this information in the system prompt. But since users rarely ask these questions, doing so would add context burden and interfere with Claude Code’s primary task: writing code.
Instead, the team tried another form of progressive disclosure. They provided Claude with a link to its documentation, which Claude could load to search for more information. This approach worked, but the team noticed Claude would load大量 results into context trying to find the right answer, when users really just wanted a concise response.
Based on this observation, the team built the Claude Code Guide sub-agent. When users ask questions about Claude Code itself, the system prompts Claude to call this sub-agent. The sub-agent has detailed instructions on how to search documentation effectively and what content to return.
While this solution isn’t perfect—Claude can still get confused when asked about setup procedures—it’s much better than before! The team successfully expanded Claude’s action space without adding new tools.
Art, Not Science: Core Principles of Tool Design
If you were hoping for a strict set of rules for tool construction, unfortunately, this article won’t provide that. Designing tools for models is both science and art. It depends heavily on the specific model you’re using, the agent’s objectives, and its operating environment.
Here are the key principles distilled from Claude Code practice:
1. Match Tools to Model Capabilities
Tool design must account for the model’s actual capabilities. Just as you’d give different tools to people with different skill levels, tools for Claude must match its “skill level.” As model capabilities increase, previously necessary tools may become redundant or even constraining.
2. Prioritize Specialized Tools Over All-Purpose Tools
While all-purpose tools (like code execution) seem appealing, specialized tools often help Claude better understand when and how to use them. The success of the AskUserQuestion tool demonstrates this principle.
3. Progressive Disclosure Beats One-Time Provision
Rather than cramming all information into system prompts, let Claude acquire information progressively through exploration. This saves context space while enabling more flexible, in-depth information gathering.
4. Let Agents Build Context Actively
Provide search tools that enable Claude to find and build its own context environment, rather than passively receiving pre-set context. This leverages the model’s growing exploratory capabilities.
5. Observe Outputs, Iterate Continuously
Claude’s outputs provide the best feedback. By carefully reading how the model calls tools, under what circumstances, and with what效果, you can continuously optimize tool design. The evolution of tools in Claude Code demonstrates the value of iteration.
6. Keep Tool Sets Lean
The threshold for adding new tools should be high. Each new tool represents an additional option for the model to consider, potentially increasing decision burden. Claude Code maintains approximately 20 tools and continuously questions whether all are truly necessary.
Frequently Asked Questions
What exactly is an agent’s “action space”?
An agent’s action space encompasses all the tools and functions it can use, including accessible APIs, executable commands, and available resources.
Why not just give Claude one all-powerful tool?
While all-powerful tools are flexible, they require Claude to figure out how and when to use them independently, increasing decision burden. Specialized tools help Claude understand usage scenarios and timing more clearly.
What does “progressive disclosure” mean in practice?
Progressive disclosure is a design philosophy that allows agents to acquire information gradually through exploration, rather than receiving all context at once. For example, Claude might read one file, which guides it to read other relevant files.
How can I determine whether Claude needs a new tool?
The key is observing whether Claude encounters unsolvable problems with its current toolset, or whether existing solutions are inefficient. The threshold for new tools should be high, as each tool adds to Claude’s choice burden.
How should tools be adjusted as model capabilities improve?
As model capabilities increase, previously necessary tools may become redundant or even limiting. The evolution from TodoWrite to Task Tool in Claude Code is a典型案例. Regularly revisiting toolset necessity is valuable.
Conclusion: Learning to Think Like an Agent
The experience of building Claude Code teaches us that designing tools for AI agents is a continuous process of observation, experimentation, and adjustment. There are no one-size-fits-all solutions, only methodologies that continuously adapt to improving model capabilities, evolving user needs, and changing environments.
“Thinking like an agent” means stepping outside human perspectives and trying to understand how models perceive and choose tools. What tools does it prefer to call? Under what circumstances? What factors might lead to tool misuse? Answers to these questions can only come from careful output reading and experimental design.
For developers building AI agents, Claude Code’s experience offers valuable reference points: tool design isn’t static but evolves with model capabilities; progressive disclosure can expand agent capabilities without increasing tool count; observation and experimentation are essential for optimizing tool design.
Ultimately, designing agent tools is both science and art. It requires rigorous experimental design, systematic observation and analysis, intuitive understanding of model behavior, and creative thinking. By learning to think like agents, we can design tools that truly meet their needs, unleashing the infinite potential of AI.

