As artificial intelligence rapidly evolves, single-agent systems increasingly struggle to handle complex real-world tasks. Multi-agent systems have emerged as a solution, enabling sophisticated problem-solving through specialized collaboration. Today, we explore a distributed agent framework built on LangGraph that uses Redis as a message broker, allowing multiple AI agents to work together seamlessly and providing a robust foundation for scalable multi-agent AI systems.

What Are Distributed Agent Systems?

Imagine a company where experts from different departments work together through efficient communication to complete complex projects. Distributed agent systems adopt this very concept, organizing multiple specialized AI agents where each focuses on specific domains and collaborates through message passing to solve problems that would challenge individual agents.

The LangGraph distributed agent framework represents such a system, combining LangGraph’s graph-based agent orchestration with Redis’s efficient message passing to create a truly scalable multi-agent AI architecture.

Core Capabilities Explained

Human-AI Collaboration Security Control: The “Safety Valve” for Agents

With AI systems growing increasingly powerful, security considerations become paramount. This framework incorporates a built-in mechanism requiring human approval for sensitive tool execution, ensuring critical operations, sensitive data access, and potentially impactful actions must undergo human review before execution.

This functions like a “safety valve” for the agent system—when sensitive operations are attempted, the system automatically pauses and waits for human confirmation. This real-time monitoring and intervention capability provides complete control over agent behavior, significantly reducing risks when deploying AI systems in real-world environments.

For example, when an agent attempts to access city GDP data, the system halts execution until explicit human approval is granted. This mechanism proves particularly valuable in sensitive fields like finance and healthcare.

True Distributed Architecture: Horizontally Scalable Agent Networks

Traditional multi-agent systems often run within single processes, suffering from single points of failure and scalability limitations. The LangGraph distributed agent framework adopts a horizontally scalable multi-agent system design where multiple agents operate independently across different processes or machines, communicating through Redis streams.

The advantages of this architecture are clear:

  • Each agent can be independently deployed, scaled, and managed
  • The overall system demonstrates stronger fault tolerance
  • Resource utilization is more efficient
  • Dynamic addition or removal of agents is supported

This design enables flexible adjustments based on load requirements, truly delivering the scalability and reliability needed for enterprise-level applications.

Hierarchical Agent Organization: Intelligent Workflow Coordination

Complex tasks often require layered processing, which is exactly where hierarchical agent organization excels. In this framework, agents can be organized into hierarchical structures where coordinating agents delegate tasks to specialized sub-agents.

Hierarchical Agent Organization Diagram

This architecture enables complex workflow orchestration with clear chains of responsibility and efficient task distribution. For instance, a main coordinating agent can receive user queries, analyze them, delegate weather-related questions to weather expert agents, assign economic analysis tasks to economics expert agents, and finally integrate results from all sub-agents before responding to the user.

Additional Key Features

Beyond the three core capabilities, the framework offers several powerful features:

  • MCP Server Integration: Supports Model Context Protocol servers to extend agent capabilities
  • Persistent State Management: Uses MySQL/SQLite checkpoints to store conversation history, ensuring state preservation
  • Extensible Design: Enables horizontal scaling through Redis streams and consumer groups
  • Easy Integration: Simple client interfaces for interacting with agent systems

Deep Dive into System Architecture

To truly appreciate the framework’s value, we need to examine its internal architecture. The system comprises several key components, each with distinct responsibilities while working closely together:

Agent Workers

Agent workers form the core execution units of the system. Each worker represents an independent agent responsible for handling specific types of tasks. They communicate through Redis streams, remaining decoupled while collaborating effectively.

Agent Clients

Clients provide interfaces for interacting with the agent system. Users or applications send messages and receive responses through these clients. The client design emphasizes simplicity and ease of use, reducing integration complexity.

Agent Runners

Runners serve as high-level wrappers for creating and managing agents, simplifying the initialization and configuration process, enabling developers to quickly deploy new agents.

Redis Streams

As message brokers for inter-agent communication, Redis streams provide high-throughput, low-latency message delivery, ensuring efficient collaboration between agents.

Checkpoint Storage

The persistent state management system based on MySQL or SQLite ensures agent states and conversation history survive system restarts, crucial for long-running agent applications.

Practical Guide: Building Your First Distributed Agent System from Scratch

Enough theory—let’s get practical by building a real distributed agent system step by step.

Environment Preparation and Installation

First, ensure your system meets these requirements:

  • Python 3.10 or higher
  • Redis server (for message brokering)
  • MySQL or SQLite (for state persistence)

Install the LangGraph distributed agent package:

pip install langgraph_distributed_agent

Environment Variable Configuration

Create a .env file with necessary environment variables:

REDIS_URL=redis://:password@localhost:6379/0
CHECKPOINT_DB_URL=agent_checkpoints.db

OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-4
OPENAI_API_KEY=sk-your-api-key

These configurations cover three critical aspects: message brokering, state storage, and AI models.

Creating Your First Agent

Let’s create an agent capable of querying weather and economic data:

from langchain_core.tools import tool
from langgraph.runtime import get_runtime
import asyncio
from langgraph_distributed_agent.agent_runner import AgentRunner
from langgraph_distributed_agent.utils import human_approval_required
import os
from typing import Annotated
from langchain_core.tools import tool, InjectedToolCallId
from langchain_core.runnables import RunnableConfig

import dotenv
dotenv.load_dotenv()

@tool
def get_city_weather(city: str) -> str:
    """
    Get the weather for a specific city.

    Parameters:
        city (str): Name of the city, e.g., "London".

    Returns:
        str: Weather description for the given city.
    """
    print("current context", get_runtime().context)
    return f"It's always sunny in {city}!"

@tool
@human_approval_required
def get_city_gdp(city: str,
                 config: RunnableConfig,
                 injected_tool_call_id: Annotated[str, InjectedToolCallId]) -> str:
    """Get city gdp"""
    print(get_runtime())
    return f"The gdp of {city} is 500 billion yuan!"


async def main():
    runner = AgentRunner(
        agent_name="demo_agent",
        system_prompt="You are a helpful assistant.",
        redis_url=os.environ.get("REDIS_URL", ""),
        mysql_url=os.environ.get("CHECKPOINT_DB_URL", ""),
        openai_base_url=os.environ.get(
            "OPENAI_BASE_URL", ""),
        openai_model=os.environ.get("OPENAI_MODEL", ""),
        openai_api_key=os.environ.get("OPENAI_API_KEY", "")
    )
    runner.add_tool(get_city_weather)
    runner.add_tool(get_city_gdp)
    await runner.start()

if __name__ == '__main__':
    asyncio.run(main())

This code demonstrates several important concepts:

  1. Tool Definition: Using the @tool decorator to define functions agents can call
  2. Human Approval: Using the @human_approval_required decorator to mark tools requiring human review
  3. Agent Runner: Using the AgentRunner class to create and manage agents

Interacting with Agents

After creating agents, we need a way to interact with them. Here’s a simple command-line client:

import asyncio
from langgraph_distributed_agent.agent_cli import AgentCLI
import os
import dotenv

dotenv.load_dotenv()

async def main():
    cli = AgentCLI(target_agent="demo_agent",
                         redis_url=os.environ.get("REDIS_URL", ""))
    await cli.run()

if __name__ == '__main__':
    asyncio.run(main())

Alternatively, you can use a more advanced Web UI interface: agents-ui

Complete Example: Building a Multi-Expert Agent System

To demonstrate the framework’s true capabilities, let’s explore a more complex example—building a system with multiple expert agents.

System Architecture

This example system includes the following components:

  • Main Agent: Coordinating agent responsible for task distribution and result integration
  • Weather Agent: Specialist agent for weather-related queries
  • Economics Agent: Specialist agent for economic data analysis
  • MCP Server: Provides additional tools and capability extensions

Running the Complete Example

  1. Start the MCP Server:
python -m examples.agent_demo.demo_mcp_server
  1. Launch Individual Agents:
python -m examples.agent_demo.main_agent
python -m examples.agent_demo.weather_agent
python -m examples.agent_demo.economics_agent
  1. Run the Client for Interaction:
python -m examples.agent_demo.cli

Actual Interaction Flow

When a user sends a query to the main agent, the following internal process occurs:

  1. Main agent receives the user message
  2. Analyzes query type to determine if delegation to expert agents is needed
  3. If delegation is required, sends tasks to appropriate expert agents
  4. Expert agents process tasks, potentially calling tools or requiring human approval
  5. Results return to the main agent
  6. Main agent integrates results and responds to the user

This process is fully automated while maintaining necessary human control over sensitive operations.

In-Depth API Reference

To fully leverage the framework’s capabilities, understanding its core APIs is essential.

AgentRunner Class

AgentRunner is the primary class for creating and managing agents, offering these key methods:

class AgentRunner:
    def __init__(self, agent_name: str, system_prompt: str, ...)
    async def add_tool(self, tool)  # Add tools
    async def add_mcp_server(self, server_url: str)  # Integrate MCP servers
    def add_subagent(self, agent_name: str, description: str)  # Add sub-agents
    async def start(self)  # Start agents

AgentClient Class

AgentClient provides a programmatic interface for interacting with agents:

import asyncio
import uuid
import os
from langgraph_distributed_agent.agent_client import AgentClient

import dotenv
dotenv.load_dotenv()

async def agent_client_test():
    client = AgentClient(
        target_agent="main_agent",
        redis_url=os.environ.get("REDIS_URL", "")
    )

    context_id = str(uuid.uuid4())

    await client.send_message("hi", context_id)

    async for event in client.progress_events(context_id):
        AgentClient.print_progress_event(event)

    last_event = await client.get_last_event(context_id)

    print("last_event.data.type=",last_event.data.type)

    if last_event.data.type == 'interrupt':
        await client.accept_tool_invocation(context_id)
    #     await client.reject_tool_invocation(context_id)

    # get chat history
    print("\n\n======= Get Chat History =======\n\n")
    his = await client.get_chat_history(context_id)

    for item in his:
        AgentClient.print_progress_event(item['data'])

This code demonstrates how to:

  • Send messages to agents
  • Monitor processing progress events
  • Handle tool invocations requiring human approval
  • Retrieve conversation history

Development and Contribution

Setting Up the Development Environment

If you want to explore deeper or contribute code, set up a development environment:

  1. Clone the Repository:
git clone https://github.com/SelfRefLab/langgraph_distributed_agent.git
cd langgraph_distributed_agent
  1. Install Development Dependencies:
pip install -e .
  1. Set Up Redis:
# Using Docker (recommended)
docker run -d -p 6379:6379 redis:latest
  1. Configure Environment:
cp .env.example .env
# Edit .env file for personalized configuration

Project Structure

Understanding the project structure helps better comprehend the codebase:

langgraph_distributed_agent/
├── langgraph_distributed_agent/    # Main package directory
│   ├── agent_client.py            # Client interfaces
│   ├── agent_runner.py            # High-level agent runners
│   ├── distributed_agent_worker.py # Core worker implementation
│   ├── redis_lock.py              # Redis-based distributed locks
│   └── utils.py                   # Utility functions
├── examples/                      # Example code
│   └── agent_demo/               # Complete demonstration system

Real-World Application Scenarios

The LangGraph distributed agent framework suits various complex scenarios:

Customer Service Systems

In customer service scenarios, multiple agents can divide responsibilities:

  • Reception agents: Initially understand customer issues
  • Technical support agents: Handle technical problems
  • Billing inquiry agents: Manage account and billing questions
  • Complaint handling agents: Specialize in complaints, potentially requiring more human review

Data Analysis Platforms

In data analysis scenarios:

  • Data query agents: Handle data extraction requests
  • Analysis agents: Perform complex data analysis
  • Visualization agents: Generate charts and reports
  • Sensitive data access requires human approval to ensure data security

Content Generation Systems

In content creation scenarios:

  • Research agents: Collect and organize information
  • Writing agents: Generate initial drafts
  • Editing agents: Optimize content quality
  • Publishing agents: Handle publishing operations, potentially requiring human approval

Frequently Asked Questions

How does this framework differ from regular LangGraph applications?

The main difference lies in the distributed architecture. Regular LangGraph applications typically run within single processes, while this framework supports multiple agents running across different processes or even different machines, communicating through Redis, providing better scalability and fault tolerance.

How does human approval work?

When an agent invokes a tool marked with @human_approval_required, the system pauses execution and sends an interrupt event to the client. The client can present an approval interface to the user. After the user chooses to approve or reject, the client accordingly calls accept_tool_invocation or reject_tool_invocation, and the agent continues execution or adjusts behavior based on this input.

How is high availability ensured?

High availability is ensured through these mechanisms:

  • Agents can run and restart independently
  • Redis provides reliable message persistence
  • Checkpoint mechanisms ensure state preservation
  • Support for running multiple agents of the same type simultaneously enables load balancing

What about system performance?

Performance depends on several factors:

  • Redis server performance
  • AI model response times
  • Network latency
  • Number of agents and workload

In practical testing, the distributed architecture allows the system to horizontally scale by adding agent instances to handle numerous concurrent requests.

Are other message brokers supported?

The current version only supports Redis as the message broker because Redis offers powerful stream functionality and reliability guarantees. Future versions may support additional message brokers.

Conclusion

The LangGraph distributed agent framework represents a significant direction in multi-agent system development. It addresses the scalability limitations of traditional agent systems through distributed architecture, ensures security through human-AI collaboration mechanisms, and optimizes complex task handling through hierarchical organization.

Whether you’re building enterprise-level AI applications or researching multi-agent systems, this framework provides a powerful infrastructure foundation. Its design philosophy—distributed, secure, scalable—lays solid groundwork for next-generation AI applications.

Through this article, we hope you’ve not only understood the framework’s basic concepts and usage methods but also grasped its design philosophy to leverage its powerful capabilities in practical projects. The era of AI powered by multi-agent collaboration has arrived, and the LangGraph distributed agent framework represents a worthwhile starting point for exploration.