Is MCP Dead? A Field Guide to Building Production Agents That Actually Connect to Real Systems

The central question this article answers: Has the Model Context Protocol failed, or is it maturing into the standard layer for production agents?

Here is the short answer: MCP is not dead. The official Claude MCP SDK has grown from roughly 100 million monthly downloads at the start of this year to approximately 300 million recently. The repeated claims of its demise are usually reactions to real early pain points, not evidence of a collapsing ecosystem. Whether an agent is genuinely useful depends far less on the model’s raw intelligence and far more on how many external systems it can actually reach. If your brilliant assistant cannot open your email, read your calendar, or file a ticket in your internal system, it remains a chatbot with a nice personality.

Today, there are three practical ways to give an agent those connections: direct API calls, command-line interface tools, and the Model Context Protocol. This article walks through where each path hits its ceiling, why MCP drew heavy criticism, how the protocol is responding, and what it actually takes to build an MCP server that survives outside a developer’s laptop.

Three Paths to the Outside World: Where Does Each One Hit a Wall?

Core question: What are the practical ways to connect an AI agent to external services, and where does each approach break down?

Summary: Agents reach external systems through direct APIs, CLI tools, or MCP. The fundamental difference is whether a unified middleware layer sits between the agent and the service. That difference determines how well the integration scales.

Direct API Calls: Easy to Start, Hard to Maintain

Calling an API directly is the most intuitive starting point. One API covers one scenario; one agent calls one API; done. In practice, this works beautifully for a proof of concept. Imagine you have built an internal help-desk system with a handful of REST endpoints. Letting an agent query and create tickets through those endpoints takes an afternoon to wire up.

But scale changes the math. When your team has five different agents and ten different services, you are suddenly maintaining fifty separate integration points. Each new combination requires its own authentication logic, its own error handling, and its own tool descriptions. This is the classic M×N integration problem: M agents multiplied by N services equals M×N codebases to maintain. At a certain scale, the maintenance cost of point-to-point integrations eats the productivity gains of automation.

CLI Tools: Fast and Lightweight, but Locally Trapped

CLI tools are the hidden workhorse of agent integration. Agents already understand command-line language, and most modern toolchains ship with mature CLIs. In a local development environment, letting an agent run gh to inspect GitHub pull requests or aws to manage cloud resources is often faster than writing wrapper code.

The ceiling here is environmental. Mobile devices do not have a shell. Web browsers do not run your local binaries. Cloud containers may not include the same toolchain you have on your laptop. Authentication in CLI workflows typically depends on credential files sitting on disk, which makes CLI ideal for personal local work and poorly suited for multi-platform, multi-user production environments.

MCP: The Protocolized Middleware Layer

MCP turns that missing middleware layer into an open protocol. Authentication, tool discovery, and semantic expression are all standardized. You build one remote server, and every compatible client—Claude, ChatGPT, Cursor, VS Code—can use it, regardless of where the agent is deployed.

This solves a fundamental problem: build once, use everywhere.

Aerial view of interconnected highway junctions representing integration architecture
Image source: Unsplash

Author’s reflection: I have watched teams agonize over whether to pick API, CLI, or MCP as if they were rival religions. In mature engineering organizations, the answer is rarely either-or. You will likely use direct APIs for legacy corners, CLI for local developer velocity, and MCP for anything that needs to run across platforms. The real skill is knowing where each one belongs, not forcing a universal winner.

Dimension	Direct API	CLI Tools	MCP Protocol
Initial setup	Low	Low	Medium
Maintenance at scale	High (M×N)	Medium	Low (build once)
Cross-platform reach	Medium	Weak	Strong
Authentication standard	None	Local credential files	Built into protocol
Best fit	Simple point-to-point	Local development	Cloud production

The Harshest Criticism: Why MCP Was Called Too Expensive to Use

Core question: What is the biggest practical complaint against MCP, and is the criticism grounded in reality?

Summary: The token cost problem is real, but it stems from schema inflation rather than the protocol itself. When an MCP server carries dozens of tool definitions, the model must read all of them before it can perform even a simple task.

The most common complaint against MCP can be summarized in one sentence: it consumes enormous amounts of context window, and tokens are expensive.

The root cause is schema inflation. Consider the GitHub official MCP server. It ships with many tool definitions, and every conversation begins by stuffing those definitions into the context window. If you simply want to check which programming language a repository primarily uses, the model must first read the instruction manuals for all available tools. A single tool definition can consume hundreds or thousands of tokens.

ScaleKit ran a strict benchmark comparing the GitHub MCP server against the gh CLI. The task was identical: query the primary language of a repository. The results were stark.

GitHub MCP server benchmark comparison showing token consumption

The MCP path consumed approximately 44,026 tokens. The CLI path consumed 1,365 tokens. That is a thirty-two-fold difference. Before the agent had done any useful work, the context window was already crowded with instruction manuals. In a production environment, that translates directly into slower responses, higher costs, and wasted context space.

Author’s reflection: This criticism reminds me of the early debates around ORM frameworks. The abstraction layer introduced to simplify development sometimes became the performance bottleneck itself. The difference with MCP is that the protocol’s designers are not pretending the problem does not exist. They are engineering around it, which is a healthier sign than blind defensiveness.

Two Practical Solutions to the Token Problem

Core question: How can you bring MCP’s token overhead down to production-viable levels without abandoning the protocol?

Summary: Tool Search and programmatic tool invocation address the problem architecturally. Together, they reduce tool-definition tokens by over 85% and workflow tokens by roughly 37%.

Solution One: On-Demand Tool Loading with Tool Search

The traditional approach loads every tool definition into context at the start of a conversation. Forty-three tools, fifty-five thousand tokens, and the workbench is already buried in manuals before the first real request.

Tool Search delays this step. The agent first describes what it intends to do. The system then searches for relevant tools at runtime and pulls only the matching few into the context window. Testing shows that this reduces tool-definition token consumption by more than 85% without degrading tool-selection accuracy.

Using the earlier GitHub scenario: before Tool Search, MCP needed 44,026 tokens versus CLI’s 1,365, a 32× gap. After cutting 85% of tool-definition tokens, MCP’s total consumption drops to roughly 10,000 tokens. The gap shrinks from 32× to about 7×. MCP is still more expensive than CLI for that specific query, but it is no longer operating in a different order of magnitude.

Solution Two: Programmatic Tool Invocation in a Sandbox

The second solution changes how tool results flow back to the model. Instead of dumping raw tool output directly into the context window, the output is sent to a code execution sandbox. The agent writes a script to loop, filter, and aggregate inside the sandbox. Only the final processed result returns to the model; the intermediate raw data never touches the context window.

Imagine an agent that needs to extract error summaries from a massive log file. Feeding the entire raw log into the prompt would explode token usage. The correct approach is to let the agent generate a processing script—perhaps using regular expressions to match error patterns, count frequencies, and group by severity. The script runs in the sandbox, and only a three-line summary returns to the model. In complex multi-step workflows, this pattern reduces token usage by approximately 37%.

# Illustrative flow using code-orchestration pattern
# Step 1: Agent searches for the relevant API endpoint
api_info = mcp.search("list DNS records for example.com")

# Step 2: Agent generates and executes a script in the server sandbox
result = mcp.execute({
    "endpoint": api_info.endpoint,
    "method": "GET",
    "params": {"zone": "example.com"}
})

# Step 3: Only the aggregated final result enters the model context
# The raw JSON, pagination, and intermediate filtering stay in the sandbox

Server room representing cloud-based code execution
Image source: Unsplash

When you combine Tool Search with programmatic invocation, the context stays lean, round trips decrease, and response latency improves. The token disadvantage that once made MCP look prohibitive is being systematically eroded by these architectural patterns.

Author’s reflection: These are not hacks or workarounds. They represent a shift in how we think about model context. The model does not need to see every raw byte of data to be effective. Separating “what the model needs to reason about” from “what the machine can process mechanically” is a principle that will outlast any specific protocol version.

Five Design Principles for a Production-Ready MCP Server

Core question: What separates a toy MCP server from one that teams can actually run in production?

Summary: Remote deployment, intent-based tool design, code orchestration, rich interactive capabilities, and standardized authentication are the five principles that determine whether an MCP server becomes infrastructure or remains a demo.

Principle One: Remote Servers Reach Every Platform

If you want an agent running on a phone, in a web browser, or in a cloud container to access your system, a remote server is the only viable form factor. Local servers are invisible to mobile and web clients. Remote servers are what every major client optimizes for.

I have seen teams build impressive local demos that crumble the moment they need to support a colleague on a different laptop or a stakeholder on a mobile device. Remote deployment is not a premium feature; it is the admission ticket to production.

Principle Two: Organize Tools by User Intent, Not by API Endpoint

The most common design trap is mapping API endpoints one-to-one to MCP tools. If your backend has endpoints for “get message thread,” “parse message,” “create ticket,” and “attach file,” the naive approach exposes four separate tools and expects the agent to assemble them like LEGO bricks.

A better approach is to design around what the user wants to accomplish. Instead of four atomic tools, expose one composite tool: “create ticket from message thread.” This is faster, more reliable, and easier for the model to reason about. Fewer, well-described tools consistently outperform large collections of granular ones.

Principle Three: Code Orchestration for Massive API Surfaces

When your service exposes hundreds or thousands of operations—think Cloudflare, AWS, or Kubernetes—even intent-based grouping cannot cover everything. In these cases, expose only one or two tools that accept code.

Cloudflare’s MCP server is the canonical example. It exposes exactly two tools: search and execute. The agent first uses search to find the right API endpoint, then writes a script and runs it through execute inside the server’s sandbox. The entire tool definition occupies roughly one thousand tokens, yet it covers approximately 2,500 endpoints.

Cloudflare MCP server architecture with search and execute tools

This is code orchestration: bringing the philosophy of CLI into the MCP protocol, but running it in the cloud over a standardized protocol rather than on a local shell. For operation-dense platforms, this is currently the most elegant solution available.

Principle Four: Rich Interactions at Critical Moments

MCP Apps allow a tool to return interactive interfaces—charts, forms, dashboards—that render directly inside the chat. Servers that adopt this capability show measurably higher adoption and retention rates.

Elicitation lets a server pause mid-invocation to ask the user for input:

Form mode: The server sends a schema, and the client renders it as a native form. This works well for filling missing parameters, confirming dangerous operations, or choosing between discrete options.
URL mode: The user is redirected to a browser. This is appropriate for OAuth authorization, payments, or any credential collection that should not pass through the MCP client.

Both patterns keep the user inside the conversation flow. There is no need to navigate to a separate settings page, break context, and return.

Principle Five: Standardized Authentication with Vaults

Whether authentication is standardized directly determines whether cloud agents can use your server at all. The latest MCP specification supports CIMD (Client ID Metadata Documents) client registration. This makes first-time logins fast and reduces the likelihood that users will be unexpectedly asked to reauthorize later.

After the user authorizes access, Claude Managed Agents’ Vaults handle token management. The user registers once, and the platform automatically injects the correct credentials into every subsequent session, refreshing them as needed. You do not need to build your own key storage infrastructure, and you do not need to pass tokens manually on every call.

Author’s reflection: I have watched teams burn weeks building custom auth flows that break every time a token expires. Standardizing on CIMD and Vaults is not exciting work, but it is the kind of boring, reliable infrastructure that separates prototypes from products. If your auth is bespoke, your integration is fragile.

MCP and Skills: The Complete Package

Core question: How do MCP and Skills work together to create agents that behave like domain experts rather than tool collectors?

Summary: MCP provides access to tools and data; Skills provide the procedural knowledge to use them. Together, they transform a set of capabilities into an end-to-end workflow.

MCP answers the question what can the agent do? Skills answer the question how should it do the work? They are complementary, not overlapping.

Distribution Through Plugins

Claude Plugins can bundle skills, MCP servers, hooks, LSP servers, and sub-agents into a single installable package. One click gives the agent a complete domain toolkit.

The effect of combining MCP with Skills is that Claude begins to behave like a domain expert. MCP hands it the professional instruments; Skills teach it how to use those instruments to complete real jobs from start to finish.

Consider the Cowork data plugin. It contains ten skills and eight MCP servers, connecting to Snowflake, Databricks, BigQuery, and Hex. A data analyst installs the plugin once and inherits a complete analytical workflow. They do not need to configure connections individually or repeatedly prompt the agent on how to query a warehouse.

Cowork data plugin combining skills and MCP servers

Distribution Directly from the Server

An alternative pattern is for the service provider to ship a Skill alongside the MCP server itself. The client receives not just raw capabilities, but a best-practice manual for using them. Canva, Notion, and Sentry are already doing this.

The MCP community is developing an extension that will allow servers to deliver skills directly. Clients will automatically inherit domain knowledge, and the skill version will remain bound to the API version. When the API changes, the usage instructions change with it.

Author’s reflection: The old paradigm was “here is the tool, read the manual.” The emerging paradigm is “here is the tool and the manual, version-locked together.” That version binding is subtle but critical. I have seen too many agents hallucinate outdated calling patterns because the API evolved but the documentation in the prompt did not. Tight coupling between capability and instruction is what makes this reliable.

Where Each Path Belongs

Core question: How should a team decide which integration strategy to use for a given scenario?

Summary: CLI belongs in local development, MCP belongs in cloud production, and direct API calls belong in simple, temporary point-to-point needs. The three paths are complementary, not competitive.

Scenario	Recommended Approach	Rationale
Local development environment	CLI + Skills	Lightweight, fast, clean context, reuses existing toolchain
Cloud production environment	MCP + Skills	Standardized, cross-platform, robust authentication
Simple or temporary integration	Direct API	No middleware overhead, fastest to validate, good for short-lived use cases

MCP is not a universal solution. It is, however, becoming the standardized access layer for cloud-based agents. A remote MCP server built today can reach every compatible client and every deployment environment. Authentication, interaction patterns, and semantics are all handled by the protocol. As the specification gains more clients and more extensions, that same server becomes more capable without requiring additional code.

If your goal is to let production-grade cloud agents use your system, write an MCP server. Then apply the patterns described above to make it robust.

Every integration built on MCP makes the entire ecosystem stronger.

A Word on the “MCP Is Dead” Narrative

The recurring declarations of MCP’s death are, at their core, a signal that the protocol matters enough to criticize. Developers do not write passionate takedowns of technologies they do not use. The token bloat problem was real. The thirty-two-fold cost gap against CLI was real. But what distinguishes a dying technology from an evolving one is whether its architects acknowledge the pain and engineer solutions. Tool Search, programmatic invocation, and code orchestration are not defensive blog posts; they are concrete protocol-level responses.

The growth in monthly downloads—from one hundred million to three hundred million—suggests that practitioners are voting with their dependencies. They are not abandoning the protocol; they are using it in more sophisticated ways.

Action Checklist / Implementation Steps

If you are preparing to bring MCP into a production environment, use this checklist:

[ ] Deploy remotely. Ensure the server is network-accessible, not restricted to localhost.
[ ] Design composite tools. Group API endpoints around user intent rather than exposing one tool per endpoint.
[ ] Evaluate API surface area. If your service exposes more than a few dozen operations, adopt the code orchestration pattern (search + execute).
[ ] Enable Tool Search. Reduce unnecessary tool-definition loading to keep context volume under control.
[ ] Integrate a code sandbox. Route complex data processing through sandboxed execution, passing only final results to the model.
[ ] Configure standard authentication. Use CIMD client registration and integrate with Vaults for token lifecycle management.
[ ] Write companion Skills. Document how to use your tools in a Skill file, and distribute it alongside the MCP server.
[ ] Support Elicitation. For dangerous or parameter-missing operations, pause and request explicit user confirmation.
[ ] Consider Plugin packaging. If you are targeting the Claude ecosystem, bundle Skills and MCP servers into a single Plugin.

One-Page Overview

Topic	Key Takeaway
Is MCP dead?	No. SDK downloads grew from 100M to 300M monthly.
Why the criticism?	Schema inflation caused massive token overhead before work began.
How is it being fixed?	Tool Search cuts tool-definition tokens by 85%; programmatic invocation cuts workflow tokens by ~37%.
How to design a server?	Remote, intent-based, code-orchestrated, interactive, standard-auth.
MCP vs. Skills?	MCP is the toolbox; Skills are the instruction manual. Together they create experts.
Which path when?	Local → CLI; Production cloud → MCP; Simple/temporary → Direct API.

Frequently Asked Questions

Q1: Should I choose MCP or CLI for my project?

Use CLI for local development where speed and minimal context matter. Use MCP for cloud and multi-platform production environments where standardization and authentication are critical. The strongest agents often use both.

Q2: Does Tool Search hurt the agent’s ability to pick the right tool?

No. Empirical testing shows that reducing tool-definition tokens by over 85% does not degrade tool-selection accuracy. The system matches tools based on the agent’s stated intent rather than loading every definition into context.

Q3: Why is a remote MCP server considered essential for production?

Remote servers are the only form factor accessible from web clients, mobile apps, and cloud containers. A local server cannot serve users outside the machine it runs on, which makes it unsuitable for real production workloads.

Q4: My platform has thousands of API endpoints. How do I avoid creating thousands of MCP tools?

Adopt code orchestration. Expose only search and execute tools. Let the agent search for the right endpoint, then write a script that runs inside the server’s sandbox. Cloudflare covers roughly 2,500 endpoints with a tool definition of about one thousand tokens using this pattern.

Q5: What is the difference between MCP and Skills?

MCP grants the agent access to tools and data. Skills provide the procedural knowledge for how to use those tools to complete real tasks. One is capability; the other is know-how.

Q6: Is MCP authentication secure enough for enterprise use?

Yes. The latest specification supports CIMD for standardized client registration. Combined with Vaults for automatic token injection and refresh, you avoid manual key handling and custom credential storage.

Q7: Do I need MCP for a simple one-off integration?

Probably not. If you are building a temporary script that calls one internal API for a narrow task, a direct API integration is simpler. MCP’s value becomes clear when you need cross-platform reuse and standardized governance.

Q8: What are MCP Apps, and why do they matter?

MCP Apps allow a tool to return interactive interfaces—forms, charts, dashboards—that render directly in the chat client. This eliminates context-breaking jumps to external pages and has been shown to improve both adoption and user retention.

Is MCP Dead? 3 Production-Ready Agent Integration Strategies That Actually Work