Five Multi-Agent Collaboration Patterns: How to Choose and Use the Right One
The core question this article answers: When you need multiple AI agents to collaborate on a task, which pattern should you use? How do you know which one fits your scenario? And what happens if you pick the wrong one?
I’ve seen many teams pick a multi-agent pattern because it sounded “impressive” rather than because it actually fit their problem. That approach usually leads to trouble.
Here’s my straightforward advice: start with the simplest pattern that can possibly work, watch where it hits bottlenecks, then evolve from there. Don’t build the “big, complete” design from day one.
This article breaks down five mainstream patterns from Anthropic’s official blog—how each one works, when to use it, and where the hidden traps are.
Pattern 1: Generator-Verifier – The Most Practical Quality Gate
The core question this section answers: If I only care about “is the output good enough” and I can write clear evaluation criteria, which pattern should I use?
How It Works
This pattern has two roles:
-
Generator: Takes a task and produces an initial result. -
Verifier: Checks that result against defined standards. If it passes, done. If not, the verifier sends back specific revision feedback.
The generator revises, and the cycle continues until the verifier is satisfied or a maximum iteration limit is reached.
A Real Example: Automated Customer Ticket Response
Imagine building a system that auto-replies to support tickets. The generator drafts a reply using product docs and ticket details. The verifier acts as QA, doing three things:
-
Fact-check against the knowledge base -
Check tone against brand guidelines -
Confirm every customer question was answered
If any check fails, the verifier points out specifics: “You misattributed feature X to the wrong plan” or “You didn’t address the refund question.”
When does this pattern shine?
When output quality matters and you can write down explicit criteria for “good.” Perfect for:
-
Code generation (one agent writes code, another writes and runs tests) -
Fact-checking -
Scoring against a rubric -
Compliance reviews -
Any domain where one mistake costs far more than an extra revision loop
A Honest Note: This Pattern Has a Deadly Trap
The system’s floor is entirely determined by how specific your verifier’s criteria are.
If you just tell the verifier “check if this is good” without giving it concrete rules, it becomes a rubber stamp that approves everything. I’ve seen teams make this mistake repeatedly—they build the loop but never define what “verify” actually means. The result is a false sense of quality control.
Also, this pattern assumes “generation” and “verification” are separable skills. But if you’re evaluating a truly creative idea, verifying it can be just as hard as generating it. In those cases, the verifier won’t be reliable.
One more thing: the loop can deadlock. If the generator simply cannot fix what the verifier is asking for, the system bounces back and forever. Always set a maximum iteration limit and a fallback (escalate to human, or return the best version with a warning).
Pattern 2: Orchestrator-Subagent – Hierarchical Task Distribution
The core question this section answers: If a task can be split into several independent sub-tasks, each with clear goals and outputs, which pattern should I use?
How It Works
This pattern is all about hierarchy. A central orchestrator (like a team lead) plans the work, distributes tasks, and aggregates results. Subagents handle specific pieces of work and report back.
A Classic Example: Automated Code Review
When someone submits new code, the system needs to check four things:
-
Security vulnerabilities -
Test coverage -
Code style -
Architectural consistency
These checks don’t interfere with each other, need different context, and produce clean reports. The orchestrator spins off specialized subagents for each check, then merges their reports into a comprehensive review.
Here’s a design detail worth noting: the main agent handles code writing, file changes, and command execution. But when it needs to search a large codebase or investigate independent issues, it spawns subagents in the background. The main thread keeps running, and results stream back. Each subagent works in its own context window and returns only distilled findings.
Think of it as the boss focusing on the big picture while employees digest messy details—the boss’s context never gets cluttered.
The Limitation: The Orchestrator Becomes a Bottleneck
I need to call this out: the orchestrator easily becomes an information bottleneck.
If one subagent discovers something that affects another subagent’s work, that information must go up to the orchestrator and then back down. For example, a security subagent finds an authentication flaw that impacts the architecture subagent’s analysis. With too many handoffs, critical details get lost in summarization.
Also, without explicit parallelization, subagents run sequentially. You pay for multi-agent token costs without getting the speed benefit of parallel workers.
Pattern 3: Agent Teams – Long-Lived Parallel Workers
The core question this section answers: If sub-tasks take a long time to complete independently, and each member needs to accumulate experience across multiple tasks, which pattern should I use?
How It Works
When a large job splits into parallel sub-tasks that take significant time, the hierarchical “manager assigns work” model becomes too rigid.
A coordinator creates multiple team members that run as independent processes. These members pull tasks from a shared queue, autonomously complete multi-step work, and signal when done.
The key difference from Orchestrator-Subagent is persistence. Orchestrators typically spin up a subagent for a single task and then discard it. In the Agent Teams pattern, members are long-lived. They accumulate domain knowledge and context across tasks, getting better over time.
A Real Example: Codebase Migration
Suppose you’re migrating a large codebase from one framework to another. Each team member independently migrates one service module—handling dependencies, rewriting code, fixing test bugs, and validating changes. The coordinator assigns modules to members, members autonomously complete the entire migration workflow, and the coordinator eventually runs a system-level integration test.
The Limitation: “Independence” Is Both Strength and Weakness
Unlike the orchestrator pattern where a manager helps pass messages, team members work in isolation. They can’t easily share intermediate progress. If A’s work affects B but neither knows, their final results may conflict.
Progress management is another headache. Some tasks take two minutes, others twenty. The coordinator must handle this “uneven” partial completion gracefully.
Resource contention makes things worse. When multiple members modify the same codebase, database, or files, you get merge conflicts or conflicting changes. You need clear boundaries in task assignment and a conflict resolution mechanism ready.
Pattern 4: Message Bus – Event-Driven Elastic Collaboration
The core question this section answers: If your workflow is triggered by unpredictable events, and your agent ecosystem will keep growing, which pattern should I use?
How It Works
As agent count grows, direct point-to-point communication becomes a nightmare. The message bus provides a shared communication channel where agents coordinate through publish and subscribe.
Agents only need two actions: publish and subscribe. Each agent subscribes to topics it cares about. A router pushes relevant messages to subscribers.
Think of it as a large company chat channel: someone posts a request, the relevant department sees it and picks it up—you don’t need to know exactly who handles it. When a new agent with new capabilities joins, it just subscribes to the right topics. No changes needed to existing wiring.
A Real Example: Automated Security Operations
Alerts pour in from various channels. A triage agent assesses severity and type:
-
High-severity network alerts → network investigation agent -
Account-related alerts → identity analysis agent
While investigating, an agent might publish a “need more context” request. A dedicated intelligence-gathering agent sees that and helps. Eventually, all findings flow to a response coordinator that decides what to do.
This pipeline is tailor-made for a message bus: events flow naturally from stage to stage, you can add new security agents as new threat types emerge, and agents can be developed and deployed independently.
The Limitation: Debugging Is Painful
Event-driven communication is so flexible that debugging becomes very hard. When one alert triggers a chain reaction across five agents, figuring out what happened requires meticulously correlated logs. That’s far more painful than stepping through an orchestrator’s sequential logic.
Router accuracy is critical. If the router misclassifies a message or drops it entirely, the system enters “silent failure”—no errors, no crashes, but nothing happens. LLM-based routers are more flexible in semantic understanding but bring LLM-specific failure modes (misunderstanding, hallucination).
Pattern 5: Shared State – Fully Decentralized Collaboration
The core question this section answers: If agents need to constantly share findings with each other, and you cannot tolerate a single point of failure, which pattern should I use?
How It Works
In previous patterns, the orchestrator, coordinator, or router acts as an intermediary for information flow. Shared state eliminates the intermediary entirely. All agents read from and write to a persistent store (database, file system, document)—a shared blackboard.
There’s no central commander. Agents look at what’s on the blackboard, pick what they can handle, work on it, and write new findings back. Usually, the process starts by writing an initial problem or data on the blackboard. Work stops when a termination condition is met—time limit, convergence threshold (no new findings for a while), or a designated “judge” agent declares the answer good enough.
A Real Example: Cross-Domain Research System
To investigate a complex question, one agent scans academic papers, another reads industry reports, another mines patents, another tracks news. Each finding can become a clue for others.
For example, the paper-reading agent finds a key researcher. The industry report agent can immediately investigate that researcher’s company.
With shared state, all findings go directly to the blackboard. The industry agent sees the paper agent’s discovery instantly—no waiting for an orchestrator to relay messages. Agents build on each other’s work, and the blackboard evolves into a growing knowledge base.
Another big advantage: no single point of failure. If any agent crashes, others keep reading and writing to the blackboard. In orchestrator or message bus patterns, if the commander or router goes down, the whole system halts.
The Limitation: The Deadliest Failure Mode Is a “Reactive Loop”
Without central coordination, agents may duplicate work or go in conflicting directions. Two agents might investigate the same clue independently. The system’s behavior emerges from interactions rather than top-down design, making outcomes harder to predict.
The most dangerous failure is the reactive loop. Agent A writes a finding. Agent B sees it and writes a comment. A sees that and replies… The system becomes two bots in infinite nested conversation, burning expensive compute without converging.
For duplicate work and concurrent writes, engineers have solutions (locks, versioning, partitioning). But the infinite loop is a behavioral pattern problem. You must design termination conditions from the start:
-
A fixed time budget -
Stop after a few rounds with no new findings -
A “judge” agent that decides when the answer is good enough
If you ignore termination design, the system will either loop until you go broke or crash because an agent’s context window overflows.
How to Choose Between Patterns
The core question this section answers: With so many patterns, which one do I actually pick? Is there a simple decision framework?
Your choice depends on a few structural questions. The five patterns mainly differ in how they partition context boundaries and manage information flow.
Orchestrator-Subagent vs. Agent Teams
Both have a coordinator that assigns work. How to choose? Ask: Does the worker need to retain memory (context) across multiple invocations?
| Scenario | Recommended Pattern | Why |
|---|---|---|
| Sub-tasks are short, focused, with clear outputs | Orchestrator-Subagent | Code review works because each check is one-off; subagents don’t need memory across tasks |
| Sub-tasks require multi-step, long-running work | Agent Teams | Codebase migration works because members work on the same service module over time, learning its dependencies, test patterns, and deployment quirks |
When subagents need to remember previous state across wake-ups, Agent Teams is the better choice.
Orchestrator-Subagent vs. Message Bus
Both can handle multi-step workflows. How to choose? Ask: Is your workflow predictable in advance?
| Scenario | Recommended Pattern | Why |
|---|---|---|
| Steps are fixed ahead of time | Orchestrator-Subagent | Code review always follows the same three steps: receive PR, run checks, aggregate results |
| Workflow is triggered by events and can change direction | Message Bus | A security ops system never knows what alert comes next, and needs to accommodate new alert types |
A practical signal: If your orchestrator’s internal “if-else” logic keeps growing to handle special cases, it’s time to switch to a message bus.
Agent Teams vs. Shared State
Both have autonomous agents. How to choose? Ask: Do agents need to see and build on each other’s work in real time?
| Scenario | Recommended Pattern | Why |
|---|---|---|
| Each agent works independently, no cross-dependencies | Agent Teams | In codebase migration, each agent owns one service; final merge is the only coordination |
| High collaboration, findings must flow immediately | Shared State | In research, when the paper agent finds something, the industry agent can use it right away |
Once team members need to share intermediate findings frequently rather than just aggregating at the end, switch to Shared State.
Message Bus vs. Shared State
Both handle complex multi-agent collaboration. How to choose? Ask: Is your task a pipeline processing events, or a gradual accumulation of knowledge?
| Scenario | Recommended Pattern | Why |
|---|---|---|
| Agents react to events in a pipeline | Message Bus | Security ops is step-by-step: completing one step triggers the next. Message bus excels at precise routing |
| Agents build on cumulative clues over time | Shared State | Research systems converge knowledge. Agents repeatedly return to the blackboard to see what others found |
Remember: the message bus still has a central router controlling who gets which message. Shared state is fully decentralized. If eliminating single points of failure is critical, Shared State gives you the most safety.
Also, if your message bus system has agents publishing mainly to “share intelligence” rather than “trigger actions,” you’ve picked the wrong pattern—that’s Shared State’s job.
A Beginner’s Guide: Start Simple
The core question this section answers: I’m just starting out and don’t know which to pick. Any advice?
In production environments, we often mix these patterns. A common combination: an orchestrator-subagent for the main workflow, with shared state nested inside a sub-task that requires heavy collaboration. Other systems use a message bus for event distribution, with agent teams handling each event type. These patterns are building blocks, not mutually exclusive religions.
One-Page Summary: When to Use Which Pattern
| Scenario | Recommended Pattern |
|---|---|
| Output quality matters and you have clear evaluation criteria | Generator-Verifier |
| Clear task decomposition, sub-tasks have clean boundaries and are short | Orchestrator-Subagent |
| Workload is parallelizable, sub-tasks are independent and long-running | Agent Teams |
| Event-driven pipeline, agent ecosystem will keep growing | Message Bus |
| Collaborative research, agents need to share findings frequently | Shared State |
| Absolutely cannot tolerate a single point of failure | Shared State |
For the vast majority of teams just starting out, I strongly recommend starting with Orchestrator-Subagent. It handles the widest range of problems with the lowest coordination overhead. Get it running, watch where it chokes, then evolve based on the specific pain point.
Practical Summary / Action Checklist
If you’re designing a multi-agent system today, run through this checklist:
-
Define your output quality requirements. Do you have clear evaluation criteria? If yes, consider starting with Generator-Verifier. -
Analyze how the task decomposes. Are sub-tasks independent or sequential? Can they run in parallel? -
Determine if subagents need memory. Is each sub-task one-shot, or do they need to accumulate context over time? -
Assess workflow predictability. Is the process fixed, or driven by unpredictable events? -
Evaluate collaboration density. Do agents need to constantly share findings, or just aggregate at the end? -
Consider fault tolerance. Can you tolerate a single point of failure? If not, prioritize Shared State. -
Start with the simplest possible working pattern. Don’t build the “complete” design from day one.
Frequently Asked Questions (FAQ)
Q1: What’s the biggest trap in the Generator-Verifier pattern?
Vague verifier criteria. If you just tell it “check if this is good,” it becomes a rubber stamp. You must give concrete, actionable evaluation rules.
Q2: What’s the core difference between Orchestrator-Subagent and Agent Teams?
Agent persistence. Orchestrator subagents are spun up and discarded per task. Agent team members are long-lived and accumulate context across tasks.
Q3: When should I switch from Orchestrator-Subagent to Message Bus?
When your workflow becomes unpredictable and your orchestrator’s “if-else” logic keeps growing to handle special cases.
Q4: How does Shared State prevent infinite loops?
You must design termination conditions: a fixed time budget, stop after several rounds with no new findings, or a judge agent that decides when the answer is good enough.
Q5: Can I mix these patterns?
Yes, and production systems often do. For example, an orchestrator-subagent for the main flow, with shared state nested inside a complex sub-task.
Q6: I’m just starting. Which pattern should I begin with?
Orchestrator-Subagent. It covers the widest range of problems with the lowest coordination cost. Evolve from there based on bottlenecks.
Q7: What’s “silent failure” in Message Bus?
The router misclassifies or drops a message. The system doesn’t crash or error—it just does nothing. Much harder to debug than a clear failure.
Q8: What’s the biggest advantage of Shared State?
Eliminating the single point of failure. Any agent can crash, and others keep reading and writing to the shared store.

