From Solo Agent to Agent Army: The Complete Hermes Multi-Agent Deployment Guide

Core question: When a single AI Agent is already powerful enough to handle research, writing, analysis, and coding, why build a multi-agent system at all? The answer lies in the compounding effect of specialization—multiple agents, each mastering one domain, working in parallel, will consistently outperform one generalist agent trying to do everything.

Modern AI agents can independently complete complex tasks like research, writing, analysis, and programming, saving significant time and effort. Yet every single agent has boundaries: it can excel in one area, but cannot maintain peak performance across all domains simultaneously. This is precisely where multi-agent architecture creates decisive advantage. By dividing labor among specialized agents, each focused on its own strength, the combined output exceeds what any solo “jack-of-all-trades” can deliver.

This guide covers the complete Hermes multi-agent system—from underlying mechanics to production deployment. We will walk through installation, configuration, role customization, message platform setup, operational commands, and visual monitoring. Whether you are building a small automation team or deploying a scalable agent cluster in production, this article provides actionable steps and validated configurations.


How Hermes Multi-Agent Works: Two Modes and Three Layers of Persistence

Core question: What fundamentally distinguishes Hermes multi-agent execution from other parallel processing frameworks?

Many tools implement ThreadPoolExecutor parallelism, context isolation, and tool restrictions. These are execution-layer mechanisms. Hermes is unique because its multi-agent system runs inside a learning, persistent environment. This persistence operates at three distinct layers.

Three Layers of Persistence

First layer: Skill auto-evolution. After a sub-agent completes a complex task, the system autonomously generates reusable Skill documents. The next time a similar task arises, the agent loads the existing skill directly—no need to re-reason from scratch. Response speed and quality improve continuously.

Second layer: Service-grade persistence. Hermes runs as a long-lived service process, not a transient Python library. It simultaneously mounts Telegram, Discord, Slack, and other platforms. You can send one message from your phone and trigger three parallel agents working on a VPS instantly.

Third layer: Execution granularity control. Beyond delegate_task, Hermes offers execute_code as a third execution tier. This lets you precisely control which operations consume LLM tokens and which run as zero-cost code execution—optimizing for both cost and efficiency.

Two Multi-Agent Modes Compared

Hermes provides two fundamentally different collaboration patterns for different scenarios:

Mode Architecture Lifecycle Memory & Config Best For
delegate_task Master-subordinate hierarchy Temporary spawn, destroyed after use No independent memory; inherits master context One-time complex task decomposition
Profiles Parallel independence Long-running independent processes Full independent config, persistent memory, distinct identity Specialized team providing continuous service

Critical constraints on delegate_task:

  • Sub-agents cannot spawn further sub-agents. Maximum depth is two levels (master → sub-agent).
  • Sub-agents are permanently restricted from: infinite recursion prevention tools, asking users questions, writing to shared memory, sending private messages, and execute_code.
  • If the master agent interrupts a task, all active sub-agents stop synchronously.

Author’s reflection: I initially saw delegate_task’s temporary nature as a limitation. In production, I realized this “use-and-discard” design prevents context pollution. For tasks requiring strict isolation—like batch code reviews—delegate_task is cleaner and safer than long-running Profiles. What looks like a restriction is actually a guardrail.


Deploying Agents with Profiles: Cloning and Setup

Core question: How do you quickly create and deploy multiple independent agents in Hermes?

Hermes makes creating sub-agents remarkably simple. One terminal command auto-generates directories, initializes configurations, and registers command aliases.

Creating Base Profiles

# Create a generic sub-agent (replace xxx with your custom name)
hermes profile create xxxxx

# Create a Xiaohongshu (Little Red Book) writing agent
hermes profile create xhswriter

# Create a research agent
hermes profile create researcher

After execution, Hermes creates a complete independent directory under ~/.hermes/profiles/<name>/ and registers a command alias in ~/.local/bin/. This means you can directly type xhswriter or researcher as command prefixes to operate the corresponding agent.

Verify creation immediately:

hermes profile list

This lists all Profiles and their current running status, confirming successful registration.

Cloning Existing Configurations

If you already have a well-configured agent, clone it to duplicate API keys and model settings without repeating setup:

# Clone from default configuration; memory and sessions remain independent
hermes profile create coder --clone

# Clone from a specific Profile
hermes profile create ops --clone-from coder

Application scenario: Suppose you have a master agent configured with Claude model and a complete Telegram gateway. Now you need a coder agent for programming tasks. Using --clone-from master, coder inherits the same model provider and authentication, but gets its own memory space and working directory. The two agents operate without interference.

Author’s reflection: I used to manually repeat API key configuration for every agent—time-consuming and error-prone. Cloning preserves authentication while isolating memory. For homogenous teams (like multiple code-review agents), this is remarkably efficient.


Dedicated Models Per Agent: Full Configuration Isolation

Core question: How do you assign the best-suited language model to each agent without conflicts?

The core power of Profiles is model isolation. Each agent has entirely independent model configuration. Changing one agent’s model does not affect others. All configuration changes take effect in the next new session. After setting up, call /new to start a fresh session.

Three Configuration Methods

Method 1: Interactive wizard (recommended)

coder model

The system guides you through selecting model provider, specific model version, and authentication method—ideal for first-time setup or model changes without risk of typos.

Method 2: Specify OAuth type

hermes -p coder auth add anthropic --type oauth

Use this when you know the exact OAuth authentication flow required.

Method 3: Direct parameter configuration

# Set specific model
coder config set model.model "anthropic/claude-sonnet-4"

# Set model provider
coder config set model.provider "openrouter"

Application scenario: In my deployment, coder uses Claude Sonnet for code understanding and generation, researcher uses a lightweight model for quick search and summarization, and xhswriter calls a model strong in Chinese creative writing. This differentiated configuration lets each agent leverage its model’s strengths in its own domain, rather than forcing all tasks to adapt to one general-purpose model.

Author’s reflection: I once ran all agents on the same powerful model. Code tasks consumed tokens that should have gone to creative writing. After separating models, response quality improved and monthly API costs dropped roughly 40%. This confirms the basic principle: the right tool for the right job.


Customizing Agent Personality with SOUL.md

Core question: How do you keep each agent focused on its own domain without overstepping into others’ responsibilities?

SOUL.md is the core identity file for an agent, holding the highest priority in system prompts. Clearly defining duty boundaries and behavioral norms ensures the agent consistently “does what it should do.”

Using a Community Role Library

The community-maintained agency-agents-zh repository provides validated professional role definitions. Best practice: clone the repository to the Hermes root directory, then use symbolic links to connect role files to each Profile’s SOUL.md:

# Clone role library to Hermes root
git clone https://github.com/jnMetaCode/agency-agents-zh.git ~/.hermes/agency-agents-zh

Binding Roles via Symbolic Links

# Define variables for maintainability
REPO=~/.hermes/agency-agents-zh
PROFILES=~/.hermes/profiles

# coder → Backend Architect
ln -sf $REPO/engineering/engineering-backend-architect.md \
        $PROFILES/coder/SOUL.md

# master → Agent Orchestrator (master controller / global coordinator)
ln -sf $REPO/specialized/agents-orchestrator.md \
        $PROFILES/master/SOUL.md

# researcher → Trend Researcher
ln -sf $REPO/product/product-trend-researcher.md \
        $PROFILES/researcher/SOUL.md

Advantage of symbolic links: When agency-agents-zh updates, simply run git pull and all Profiles using that role auto-sync to the latest definition—no manual edits needed.

cd ~/.hermes/agency-agents-zh && git pull
# All symbolically linked Profiles update automatically

Verifying Personality Configuration

After configuration, verify the agent loaded its role correctly:

xhswriter chat -q "Please describe your responsibilities"

If the response matches the role definition in SOUL.md, configuration is successful.

Author’s reflection: I once copy-pasted role descriptions into every SOUL.md manually. When the repository updated, I spent an entire afternoon syncing changes. Switching to symbolic links reduced maintenance effort to near zero. This seemingly minor habit creates massive time compounding when managing more than five agents.


Independent Message Platform Entry Points: Telegram Setup Example

Core question: How do you give each agent its own communication channel for true parallel responsiveness?

For a multi-agent system to deliver value, entry isolation is essential. Below is a Telegram example showing how to configure independent bot entry points for each agent.

Step 1: Create Independent Bots

Each agent needs an independent Telegram Bot. Search for @BotFather in Telegram, send /newbot, and follow prompts to enter a bot name (e.g., xiaohongshu_Researchwang_bot). BotFather returns a token in this format:

7234567890:AAHxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Create separate bots for four agents, obtaining four independent tokens.

Step 2: Batch-Write Environment Variables

Write each token to the corresponding Profile’s .env file:

# coder
echo "TELEGRAM_BOT_TOKEN=8626455870:AAHkP_u-ccVJJnD0d3dKP5CtvT4" >> ~/.hermes/profiles/coder/.env

# master
echo "TELEGRAM_BOT_TOKEN=8633288874:AAF-4eOxACvPFBKGACnaAJAE" >> ~/.hermes/profiles/master/.env

# researcher
echo "TELEGRAM_BOT_TOKEN=8732715539:AAF0FWav__krJUGaHIu-SrkTlcY" >> ~/.hermes/profiles/researcher/.env

# xhswriter
echo "TELEGRAM_BOT_TOKEN=8716407357:AAEDzIuQhe_iEVIhF-Ps" >> ~/.hermes/profiles/xhswriter/.env

Step 3: Configure Gateway Access Permissions

Each Profile’s Gateway needs individual configuration to specify which users can message that agent. First obtain your Telegram User ID (send any message to @userinfobot to receive it).

Then launch the configuration wizard for each agent:

coder gateway setup
master gateway setup
researcher gateway setup
xhswriter gateway setup

The wizard asks for platform selection. Since tokens are already in .env, choose Done and confirm remaining options.

Step 4: Configure Pair Code and Verify

After Gateway setup, configure the Pair Code for the sub-agent. Once done, you can start chatting.

Check the sub-agent’s running status:

systemctl --user status hermes-gateway-xhswriter

If it shows active (running), the Gateway is successfully listening for Telegram messages.

Application scenario: In my production deployment, four agents correspond to four Telegram bots. When I spot a trending topic in an operations group, I directly message @researcher_bot for deep analysis; forward the analysis to @xhswriter_bot for Xiaohongshu copy; and simultaneously ask @coder_bot to prepare a data scraping script. Three agents work in parallel without blocking each other, compressing the entire content production cycle from two hours to twenty minutes.

Author’s reflection: I initially tried running multiple agents through one bot, routing by keywords. Context cross-contamination caused frequent misinterpretation. Independent bots add upfront configuration work, but the isolation and stability gained are worth it. This reinforces a recurring theme: architectural clarity matters more than saving a few lines of config.


Essential Command Reference for Multi-Agent Operations

Core question: How do you efficiently manage a running agent cluster—monitoring, hot-reloading configs, and troubleshooting?

Status Overview and Matrix Inspection

# List all Profiles and running status
hermes profile list

# View detailed info for a specific agent (model, skills, bound platforms)
hermes profile show coder

# View current agent's complete configuration
coder config

# View current agent, authentication, and platform connection status
hermes status

# Output complete shareable configuration summary (paste when troubleshooting)
hermes dump

Identity Switching and Targeted Operations

# Switch default agent; subsequent hermes commands target coder
hermes profile use coder

# Switch back to default master controller
hermes profile use default

# Without switching default, send one message to a specific agent
hermes -p coder chat -q "Hello"

Hot-Reloading Configuration (No Restart Required)

# Change coder's model
coder config set model.model "anthropic/claude-opus-4"

# Change working directory
ops config set terminal.cwd /home/ubuntu/projects

# Expand memory capacity
researcher config set memory.memory_char_limit 5000

# Batch-modify the same config item across all agents
for profile in coder ops researcher writer; do
  $profile config set model.provider openrouter
done

Skill Library Updates

# Update specific agent's skill library
coder update

# Sync all agents' skills in one command
hermes update

# Update installed skills to latest versions
hermes skills update

# Check which skills have upstream updates
hermes skills check

Note: After SOUL.md modifications, Hermes auto-loads new content in the next new session. Use /new in the current conversation to take effect immediately without restarting the process.

Deployment and Migration

# Package backup
hermes profile export coder

# Backup with date stamp
hermes profile export coder -o /backup/coder-$(date +%Y%m%d).tar.gz

# Restore from backup
hermes profile import coder.tar.gz

# Restore under a new name (avoid conflicts)
hermes profile import coder.tar.gz --name coder-prod

# Rename
hermes profile rename coder dev-assistant

# Delete (requires confirmation)
hermes profile delete writer

# Delete without confirmation
hermes profile delete writer --yes

System-level full backup:

# Full backup
hermes backup

# Quick backup (key state files only)
hermes backup --quick --label "pre-upgrade"

# Restore from system backup
hermes import hermes-backup-*.zip

Log Monitoring

# Real-time tail of main agent logs
hermes logs -f

# Real-time error log stream only
hermes logs errors -f

# Real-time gateway logs (Telegram/Discord send/receive)
hermes logs gateway -f

# View last 50 log entries
hermes logs -n 50

# View last 20 error records
hermes logs errors -n 20

# WARNING and above from past hour
hermes logs --level WARNING --since 1h

# Errors from past 30 minutes
hermes logs --since 30m --level ERROR

# List all log files and sizes
hermes logs list

# View specific Profile logs
coder logs -f
ops logs errors -n 20

Concurrency Control: One-Command Start/Stop

# View all gateway service statuses
sudo systemctl status hermes-gateway-*

# Start all installed gateway services
sudo systemctl start hermes-gateway-coder hermes-gateway-ops hermes-gateway-researcher hermes-gateway-writer

# Stop all (release resources)
sudo systemctl stop hermes-gateway-coder hermes-gateway-ops hermes-gateway-researcher hermes-gateway-writer

# Restart a specific agent's gateway
sudo systemctl restart hermes-gateway-ops

# Targeted operations via -p flag (no gateway dependency)
hermes -p coder chat -q "Check src/auth/ for SQL injection vulnerabilities"
hermes -p researcher chat -q "Search for latest AI developments this week"

Author’s reflection: Log commands are my first line of defense. When xhswriter suddenly stopped responding, hermes logs errors -f revealed Telegram API rate limiting—not an agent fault. This rapid diagnosis capability is critical when managing multi-agent clusters. I recommend aliasing frequently used log commands in your shell config.


Visual Monitoring: Deploying the hermes-web-ui Dashboard

Core question: When multiple agents run in parallel, how do you escape the command line and manage everything through a graphical interface?

The command line suits fine-grained operations but cannot simultaneously monitor multiple agents. The hermes-web-ui dashboard—created by contributor路飞 (Luffy)—is designed specifically for multi-agent scalability and provides:

  • Session grouping by source platform: Separate views for Telegram, Discord, and Slack conversations
  • Built-in web terminal: Execute agent commands directly in the browser
  • Skills and Memory management UI: Visually inspect and edit agent skill libraries and memory
  • Token usage statistics: Real-time monitoring of API consumption per agent

Deployment Method

The most convenient deployment path is letting an agent handle it:

# Simply tell an agent
Help me deploy https://github.com/EKKOLearnAI/hermes-web-ui

After deployment, the system generates an access token. Enter the token to log in to the web interface, where you can view real-time status, conversation history, and resource consumption for all sub-agents.

Application scenario: In my daily operations, hermes-web-ui stays open as a pinned browser tab. First thing each morning: check the token usage dashboard to confirm overnight batch jobs consumed normally. During the day: use the web terminal to quickly dispatch commands to specific agents. Evening: inspect Memory growth across agents and decide whether cleanup is needed. This visualization layer dramatically reduces the cognitive load of managing a multi-agent cluster.

Author’s reflection: I insisted on pure command-line management for six agents, believing “real engineers don’t need GUIs.” As agent count grew and session cross-complexity increased, context-switching costs rose steeply. Adding hermes-web-ui isn’t “getting softer”—it’s preserving mental bandwidth for deep technical decisions that actually matter.


Practical Action Checklist

Quick-Start Verification Table

Step Action Verification Command
1 Create Profile hermes profile create <name>
2 Clone config (optional) hermes profile create <name> --clone
3 Configure dedicated model coder model or coder config set model.model "xxx"
4 Bind SOUL.md role ln -sf <role.md> ~/.hermes/profiles/<name>/SOUL.md
5 Create Telegram Bot and get Token Execute /newbot with @BotFather
6 Write Token to .env echo "TELEGRAM_BOT_TOKEN=xxx" >> ~/.hermes/profiles/<name>/.env
7 Configure Gateway <name> gateway setup
8 Configure Pair Code Follow wizard prompts
9 Start Gateway service sudo systemctl start hermes-gateway-<name>
10 Verify running status systemctl --user status hermes-gateway-<name>
11 Deploy visual dashboard (recommended) Ask agent to deploy hermes-web-ui

One-Page Overview

Core principle: Multi-agent is not about quantity stacking—it is about specialization. Each agent independently configures its model, role, and communication entry. Through persistent memory and skill evolution, output quality improves continuously.

Two modes:

  • delegate_task: Temporary spawn for one-time complex task decomposition. Maximum depth: two levels.
  • Profiles: Long-running for specialized continuous service. Full independent configuration.

Key configurations:

  • Model isolation: Each Profile independently sets model.model and model.provider
  • Role definition: Bind professional roles via SOUL.md symbolic links for unified updates
  • Entry isolation: Each agent gets an independent Telegram Bot to prevent context cross-contamination
  • Hot reload: Config changes take effect after /new in a new session; no process restart needed

Frequently Asked Questions

Q1: Should I use delegate_task or Profiles?
A: Use delegate_task for temporary decomposition of one-time complex tasks (e.g., “analyze three files simultaneously and summarize”). Use Profiles for long-running specialized teams (e.g., “code assistant” + “copywriter” + “data analyst”).

Q2: Why doesn’t my agent immediately reflect SOUL.md changes?
A: SOUL.md changes load in the next new session. Type /new in the current conversation to take effect immediately without restarting Hermes.

Q3: Does cloning a Profile copy its memory?
A: No. --clone only copies API keys and model configuration. Memory and conversation history remain isolated, preventing context contamination between agents.

Q4: Can multiple agents share one Telegram Bot?
A: Technically possible, but strongly discouraged. Shared bots cause context cross-contamination and routing confusion. Independent bots per agent are prerequisite for stable multi-agent collaboration.

Q5: How do I check which model an agent is currently using?
A: Run hermes profile show <name> or <name> config. The output includes the currently bound model provider and specific model name.

Q6: During Gateway setup, the wizard asks about platform selection. What should I choose?
A: Since the token is already written to .env, select Done. Confirm remaining options with defaults.

Q7: Is there a shortcut to batch-modify the same config across all agents?
A: Use a shell loop: for profile in coder ops researcher writer; do $profile config set <key> <value>; done

Q8: What if I forget the hermes-web-ui token?
A: Redeploy or check the deployment log output for the token. I recommend recording the token in a password manager after first deployment.


Closing thought: Moving from solo agent operations to coordinated agent armies, Hermes provides a clear, actionable path. The key is not deploying many agents immediately, but understanding each agent’s boundaries, configuring isolation mechanisms properly, and establishing sustainable operational workflows. Start with one master controller, gradually add specialized members, and you will see nonlinear growth in overall output—this is systems thinking applied to AI engineering.