Chain-of-Agents: How AI Learned to Work Like a Team

Figure 1: AFM outperforms traditional methods across benchmarks

The Evolution of AI Problem-Solving

Remember when Siri could only answer simple questions like “What’s the weather?” Today’s AI systems tackle complex tasks like medical diagnosis, code generation, and strategic planning. But there’s a catch: most AI still works like a solo worker rather than a coordinated team. Let’s explore how researchers at OPPO AI Agent Team are changing this paradigm with Chain-of-Agents (CoA).


Why Traditional AI Systems Struggle

1. The “Lone Wolf” Problem

Most AI systems today use one of two approaches:

  • ReAct-style systems: Follow rigid “think → act → observe” patterns like an assembly line worker.
  • Multi-agent systems: Require complex manual coordination between specialized AI “agents” (like hiring different contractors for a project).

Both approaches face critical limitations:

Challenge ReAct Systems Multi-Agent Systems
Efficiency Low High but complex
Flexibility Fixed workflows Requires constant reconfiguration
Learning Hard to improve with data Can’t learn from experience

Think of it like comparing a single chef cooking an entire meal versus managing a kitchen with specialized chefs who don’t communicate well.


Introducing Chain-of-Agents: AI Teamwork Made Simple

Figure 2: CoA vs Traditional Methods

The OPPO team created CoA to let a single AI model dynamically simulate team collaboration without complex setup. It’s like having a project manager who can instantly assemble the right team members for each task.

Key Components of CoA

Role Type Examples Function
Thinking Agents Planning Agent, Reflection Agent Oversee strategy and quality
Tool Agents Search Agent, Code Generator Handle specialized tasks

This architecture allows the AI to:

  1. Decompose complex problems
  2. Assign tasks to virtual “team members”
  3. Adapt workflows in real-time
  4. Learn from experience

How CoA Works: The Training Process

1. Learning from the Best

Researchers used multi-agent distillation to teach CoA:

  1. Record how state-of-the-art multi-agent systems solve problems
  2. Convert these recordings into training examples
  3. Fine-tune a base AI model using these examples
Figure 3: Training Framework

2. Progressive Quality Filtering

The training data went through four layers of refinement:

Filter Stage Purpose Example Criteria
Complexity Remove trivial tasks Require ≥5 agent interactions
Quality Eliminate errors Verify correct answers/functional code
Reflection Prioritize self-correction Keep tasks showing error detection
Error Correction Reward recovery Boost samples where AI fixes mistakes

This created a dataset of 16,433 high-quality training examples with reasoning chains 5-20 steps long.


Experimental Results: CoA in Action

1. Web Agent Performance

On knowledge-intensive tasks like GAIA and BrowseComp:

Benchmark CoA (32B model) Best Existing Method
GAIA 55.3% 53.2% (WebSailor)
BrowseComp 11.1% 10.5% (WebDancer)
HLE 18.0% 15.8% (WebThinker-RL)

2. Code Agent Breakthroughs

For programming tasks:

Benchmark CoA-32B Previous Best
LiveCodeBench v5 47.9% 42.4% (Reveal)
CodeContests 32.7% 10.3% (ReTool)

3. Mathematical Reasoning

On the challenging AIME25 benchmark:

“AFM achieves 59.8% solve rate, a 10.5% improvement over previous best methods” [citation:11]


Why CoA Outperforms Traditional Methods

1. Computational Efficiency

Tests on GAIA benchmark showed:

Metric CoA Traditional MAS TIR Methods
Tool Calls 4.2 avg 6.8 avg 5.4 avg
Token Consumption 84.6% less Baseline 32% less
Figure 5: Efficiency Comparison

2. Generalization to New Tools

CoA models showed surprising adaptability:

  • Code agents could use web search tools
  • Web agents struggled with code formatting

This suggests CoA captures general collaboration patterns rather than task-specific routines.


Test-Time Scaling: The Power of Multiple Attempts

When allowed 3 attempts per question:

Benchmark CoA Improvement
GAIA +14.6%
HLE +15.2%
WebWalker +15.7%

This outperforms traditional methods where multiple attempts yield smaller gains.


Practical Implications for SEO and Content

While CoA focuses on AI capabilities, its principles align with modern content strategies:

1. Content Quality Matters

Just as CoA prioritizes high-quality training data, search engines reward comprehensive, accurate content [citation:13]. The 16,433 training examples with 5-20 step reasoning chains mirror how detailed content outperforms shallow articles.

2. Structured Problem Solving

CoA’s planning → execution → reflection cycle resembles effective content strategies:

  1. Research phase (identify user intent)
  2. Content creation (solve the query)
  3. Optimization phase (update based on performance)

3. Technical Writing Best Practices

The paper’s structure demonstrates technical writing principles [citation:16]:

  • Clear section hierarchy
  • Visual aids (figures)
  • Performance metrics
  • Use cases

Future Directions: What This Means for AI

The open-source release of CoA (model weights, code, and data) creates opportunities for:

  1. More efficient AI assistants
  2. Better reasoning in specialized domains
  3. Foundation for next-generation agent systems

As AI continues to evolve from “lone workers” to “collaborative teams,” frameworks like CoA will likely become standard for complex applications.


Conclusion

Chain-of-Agents represents a fundamental shift in how we build AI systems. By teaching models to work like coordinated teams rather than rigid pipelines, researchers have achieved state-of-the-art results across multiple domains while improving computational efficiency. As these systems mature, we can expect AI to handle increasingly complex real-world problems with the flexibility and adaptability of human teams.


All images sourced from the original research paper. For complete methodology details and benchmark results, refer to the full paper.