AI Agent Memory Showdown: Hermes-Agent vs. Custom OpenClaw OS

高效码农

13 hours ago

Building an Operating System for Your AI Agent: A Deep-Dive Comparison of Hermes-Agent vs. Self-Built OpenClaw Harness

Have you ever spent three hours working with Claude, meticulously refactoring an entire module and discussing every nuance of your project’s business logic—only to return the next day and find it remembers nothing? You’re forced to re-explain the project background, coding standards, and pitfalls you uncovered yesterday. It’s like having a brilliant colleague who suffers from complete amnesia every morning.

This isn’t a flaw in Claude specifically. It’s a systemic problem affecting all AI agents: context windows function as volatile memory, and when the session ends, the accumulated expertise evaporates. The solution requires building what I call a “Harness”—not an operating system that runs code, but a meta-system that governs memory, constrains behavior, and ensures quality.

Based on the three pillars of Harness Engineering—evaluation loops, architectural constraints, and memory governance—two mature technical paths have emerged. Nous Research released Hermes-Agent in March 2025, a self-evolving agent framework. Separately, my team built MemOS, a Harness system on top of OpenClaw that we internally call the “Phoenix Architecture.” Both implement Harness Engineering, but their design philosophies diverge dramatically. By the end of this comparison, you’ll know which path fits your needs.

Part 1: Core Design Philosophies—What Problem Is Each Approach Trying to Solve?

Summary: Hermes-Agent mimics human working memory with bounded hot memory and active evolution, while self-built Harness treats memory as unlimited external storage with system-enforced governance. Your choice depends on whether you prioritize quick personal assistance or long-term team collaboration.

What fundamentally different assumptions do these two systems make about agent memory?

Hermes-Agent operates on a neuroscientific analogy: agents should function like human brains, with limited but refined working memory and deeper retrieval systems for comprehensive information.

Key characteristics:

Three-tier memory architecture: L1 hot memory (2,200 character limit, analogous to working memory), L2 external memory (8+ provider plugins including Honcho semantic search), L3 cold retrieval (SQLite FTS5 full-text search across sessions)
Active management: The agent itself decides what deserves retention and what should crystallize into skills
Self-learning loop: Background periodic triggers review and optimize skills without blocking conversations
CLI-first: Terminal plus messaging platforms (Telegram/Discord/Slack) for lightweight, rapid interaction
Cost control: Frozen snapshots protect prefix cache; auxiliary copilot model routing degrades expensive tasks to cheaper models

Ideal scenario: You need a personal AI assistant that remembers your preferences, launches quickly, and doesn’t require complex infrastructure.

The self-built OpenClaw Harness makes opposite assumptions: agent memory should function like external hard drives—unlimited capacity with automatic system management rather than agent self-regulation.

Key characteristics:

Unbounded memory: Tens of thousands of memories plus hybrid search; old experiences never get squeezed out
System enforcement: Hook mechanisms automatically capture trajectories without depending on agent volition
Visual management: Web interface for observing memory evolution, skill phylogenetic trees, and task traces
Team collaboration: Multiple resident agents plus worker bees (disposable specialists), with permission isolation and knowledge sharing

Ideal scenario: You need multiple agents collaborating long-term, accumulating substantial experience, with visual management and analysis capabilities.

Author’s reflection: I spent months oscillating between these philosophies. The human-brain analogy feels intuitively right—we understand forgetting and remembering. But watching our agents “forget” critical debugging sessions because they were “busy” made me appreciate the external-hard-drive approach. Sometimes mechanical reliability beats biological fidelity.

Part 2: Memory Governance—Solving Agent Amnesia

Summary: Hermes uses bounded hot memory with cold retrieval fallback, forcing the agent to refine what’s immediately available. The self-built system uses unbounded storage with automatic deduplication and hybrid semantic search, achieving 100% capture rates through mandatory hooks.

Why do agents keep forgetting everything important?

The context window limitation is merely the surface symptom. The root problem is lack of persistent memory governance. Traditional approaches depend on developers manually maintaining state or agents voluntarily calling memory tools. This creates three fatal flaws: low capture rates (agents “forget” to store information when busy), capacity anxiety (deleting old content risks discarding critical experience), and retrieval blind spots (keyword matching fails to understand semantic relationships).

How does Hermes-Agent handle memory with its three-tier system?

Hermes adopts a “human brain simulation” strategy that forces the agent to actively curate its memory.

L1 Hot Memory (Working Memory)

Storage: ~/.hermes/memory/MEMORY.md (2,200 character limit) and USER.md (1,375 characters)
Management: Agent uses memory add, memory replace, memory remove tools to actively maintain
When full: Agent must choose—delete old content or merge related memories

L2 External Memory (Long-term Storage)
Supports 8+ external providers (like Honcho semantic search), offering semantic retrieval beyond text file capacity.

L3 Historical Conversation Retrieval
SQLite FTS5 full-text search across sessions, latency under 50ms.

Practical example: When I configured Hermes for personal coding assistance, the 2,200 character limit forced surprisingly disciplined summarization. After a complex debugging session, the agent had to decide: keep the specific error pattern, or the general architectural lesson? This curation produced high-quality hot memory, though I occasionally needed to manually query L3 to recover dropped details.

Strengths:

Forced refinement ensures high-quality immediate memory
File storage is simple, readable, and portable
FTS5 retrieval is fast

Limitations:

L1 capacity constraints may prove insufficient for complex projects
Depends on agent volition; critical information may be omitted
L2 requires additional external service configuration

How does the self-built Harness achieve unbounded memory with perfect capture?

The custom system employs a “database plus mandatory capture” strategy solving capacity and capture rate problems simultaneously.

Technical architecture:

PostgreSQL 17 + pgvector
├── memories table          # 35,000+ memory entries
├── skills table            # Hundreds of skills
├── task_skills table       # Task-skill relationship mapping
└── memory_graph table      # 65,524 relationship edges

Hook mechanism: the secret to 100% capture

Why do traditional schemes achieve only ~60% capture? They depend on agents “remembering” to store. The custom system exploits OpenClaw’s event-driven hooks:

before_agent_start (pre-launch): Every agent launch automatically retrieves relevant memories and injects them into the system prompt. The agent cannot avoid seeing them.
agent_end (post-completion): Every conversation round automatically captures the complete dialogue and processes intelligent deduplication.

Intelligent deduplication in three layers:

Content hash exact deduplication: Identical conversations are skipped entirely
Top-5 similarity detection: Threshold 0.75 identifies highly similar memories
LLM final arbitration: Determines relationship—DUPLICATE (skip), UPDATE (merge into existing memory), or NEW (create fresh memory)

This means repeatedly discussing the same topic doesn’t inflate memory with duplicate entries; experiences automatically merge and evolve.

Hybrid search: semantic understanding plus exact matching

Combining BM25 keyword + vector cosine similarity + RRF fusion ranking:

BM25 handles exact matching (“Python error handling”)
Vectors handle semantic understanding (“exception catching” matches too)
RRF fuses both result sets

This exceeds Hermes’s FTS5 capability in semantic comprehension. FTS5 matches keywords only; it cannot recognize that “Python error handling” and “exception catching” describe identical concepts.

Two-tier sharing architecture

Traditional sharing faces three dilemmas: single shared libraries cause privacy leaks, independent libraries create knowledge silos, and periodic synchronization introduces complex conflicts. The custom system employs two-tier design:

Tier 1: Same-instance sharing (Local Scope)
Physically same database, logically isolated:

All agents share one local database
Each memory has an owner field identifying归属
Search scope parameters control visibility: private (owner only), shared (team), public (pushable to remote Hub)

Zero-copy sharing—no data copying from strategist database to technical consultant database; physically one dataset, logically visible on demand.

Tier 2: Cross-instance Hub-Client
Hub is an HTTP server listening on gatewayPort + 11. Critical design: data sharing is active push, not automatic sync. Private data never leaves local instance; only actively shared content reaches Hub. Hub searches return summaries only; details require secondary requests.

Operational example: Our strategist agent maintains private deliberations (“considering vendor X replacement”) as private scope, while sharing Python error-handling patterns as shared scope. Worker bees searching see only shared memories, protecting strategic confidentiality while propagating technical knowledge.

Performance data:

Total memories: 35,000+ (strategist 44%, Huangjia-1 31%, shared 10%)
Token savings: 72% versus native memorySearch
Retrieval latency: First ~8s (model loading), subsequent <100ms
Graph relationship edges: 65,524

Strengths:

No upper limit; old experience never squeezed out
100% capture rate, independent of agent volition
Permission isolation suitable for multi-agent collaboration

Limitations:

High setup cost (database + hook system)
Slow first retrieval (vector model loading ~8s)
Ongoing maintenance requirements

Author’s reflection: The 8-second cold start initially frustrated me. But watching our “strategist” agent automatically surface a debugging pattern from three months prior—without anyone remembering to query for it—converted me. The hook mechanism transforms memory from an API the agent might call into infrastructure the agent cannot escape. That’s the difference between optional and mandatory.

Part 3: Architectural Constraints—Defining Agent Responsibilities

Summary: Hermes uses a primary agent spawning parallel sub-agents with memory inheritance. The self-built system uses multiple resident specialists plus disposable worker bees with DNA-level shared knowledge and mandatory experience feedback before bee destruction.

Why do single agents fail at complex tasks?

When one agent attempts everything, it excels at nothing. Coding suddenly interrupted by tweet composition, tweeting interrupted by configuration tuning—eventually context explodes and nothing completes. Harness Engineering’s second pillar addresses this through architectural constraints that clarify responsibility boundaries.

How does Hermes structure agent responsibilities with primary and sub-agents?

Design philosophy: One primary agent coordinates, spawning sub-agents for specific tasks when needed, supporting parallel execution.

Technical implementation:

Primary Agent (resident)
├── Receives task
├── Determines if sub-agent spawning needed
├── Spawns multiple sub-agents (inherit primary's memory snapshot)
│   ├── Sub-agent A: Handle task 1
│   ├── Sub-agent B: Handle task 2 (parallel)
│   └── Sub-agent C: Handle task 3 (parallel)
├── Sub-agents execute independently (isolated environments)
├── Sub-agents return results
└── Primary integrates results

Key characteristics:

Parallel execution: Multiple sub-agents run simultaneously
Isolated environments: Each sub-agent has independent session, conversation thread, and terminal
Memory inheritance: Sub-agents receive primary’s memory snapshot at birth, but operate independently thereafter

Personality and context files:
Personality defines agent roles; context files inject project-level knowledge (project.md, style.md, conventions.md).

Skills system:
Agent creates skills via skill_manage tool, stored in ~/.hermes/skills/. Progressive disclosure saves tokens: only skill description indexes inject at startup; full content loads on demand.

Practical example: I used Hermes for a content creation workflow. The primary agent coordinated research, writing, and editing sub-agents simultaneously. The parallel execution was elegant, though I noticed all sub-agents shared the primary’s full memory—including sensitive client details I wished had been isolated.

Strengths:

Simple architecture, easy to understand
Primary maintains context continuity
Sub-agents destroy after use, consuming no persistent resources
Skills Hub offers extensive community skills

Limitations:

All sub-agents share primary’s memory (no permission isolation)
Sub-agents are temporary, unsuitable for long-term collaboration
Skill creation depends on agent volition, risking omissions

How does the self-built Harness organize multiple resident agents and worker bees?

Design philosophy: Multiple resident agents with clear division of labor spawn worker bees (disposable specialists) for specific tasks, with mandatory experience feedback before bee destruction.

Technical implementation:

4 Resident Agents (defined responsibilities)
├── Strategist: Task coordinator, quality auditor
├── Huangjia-1: Integration coordinator, external output
├── Technical Consultant: Engineer, coding and configuration
└── Creative Partner: Content creation, tweets and articles

Worker Bee System (spawned on demand)
├── Writing Bee: Specialized article composition
├── Illustration Bee: Specialized image generation
├── Crawler Bee: Specialized data extraction
└── ... (defined as needed)

DNA-level sharing:
All agents and bees share shared-knowledge/ directory containing:

AGENTS.md: System-level specifications (auto-injected into system prompts)
config/: Model configurations, API keys
knowledge/: Public knowledge base
templates/recipes/: Task recipes (article.md, tweet.md)
bees/: Bee definitions
tools/: Tool usage guidelines

At birth, AGENTS.md automatically injects into every bee’s system prompt, containing tool parameter specifications, output language, writing style, forbidden word lists, and feedback specifications. This is DNA—every bee inherits the team’s complete genome without requiring “alignment” conversations.

Deathbed feedback and auto-recall:

After task completion, system hooks automatically capture complete trajectories. When the next bee launches, the system automatically retrieves and injects relevant skills—not an API call the bee chooses to make, but system-level mandatory injection.

Deep skill system comparison:

Hermes skills depend on agent autonomous skill_manage create calls with periodic background review. This produces known issues: skill duplication (frequently reported community problem), weak deduplication depending on agent volition plus background review, no traceability (“where did this skill originate?”).

The custom system’s skill generation is system-enforced:

Rule filtering: Checks chunk count minimums and “non-trivial” content standards
Hybrid retrieval: FTS + Vector + RRF finds existing related skills
LLM decision: Confidence ≥0.7 updates existing skill, <0.3 generates new skill, middle ground waits
Quality scoring: 0-10 scale, ≥6 writes to database with task_skills relationship recording evolution

Operational example: A writing bee previously erred by using excessive technical terminology; reader feedback indicated incomprehension. The skill system captured this: “Tweet writing avoids terminology stacking; every technical concept requires plain-language explanation.” Three days later, a new writing bee launching for a tweet task received automatic skill injection. It wrote in plain language from the first sentence because it “knew” this rule inherently. The previous bee’s failure became the next bee’s instinct.

Performance comparison:

Capture rate: Hermes ~60% (volition-dependent), custom system 100% (system-enforced)
Quality traceability: Custom system shows skill origins and usage counts
Deduplication: Hermes relies on agent + background review; custom system uses LLM scoring + confidence thresholds

Visualization comparison:

Hermes uses TUI (terminal interface)—simple and direct, but cumbersome for browsing large memory volumes or relationship graphs. The custom system provides web visualization:

Memory browser: owner filtering, timeline display, relevance search, graph relationship viewing
Skill management: quality score viewing, task_skills relationship graphs, manual editing, skill phylogenetic trees
Task trace visualization: complete lifecycle viewing, experience evolution trees, failure analysis

Author’s reflection: Watching the skill phylogenetic tree grow over weeks revealed patterns I’d never anticipated. One debugging skill evolved through seven iterations, each refinement traceable to specific failed tasks. This visibility transforms agent development from alchemy into engineering—you can see what’s working and why.

Part 4: Evaluation Loops—Preventing Quality Drift

Summary: Hermes provides cron scheduling with multi-platform messaging alerts. The self-built system implements multi-layer evaluation: heartbeat checks, TODO consistency verification, strategist agent review, and delayed failure analysis with automatic experience crystallization.

How do you know if your agent is getting worse over time?

Agents complete tasks, but correctness remains uncertain. Over time, quality degrades unnoticed. Harness Engineering’s third pillar establishes evaluation loops enabling self-checking and learning from failure.

What evaluation capabilities does Hermes-Agent provide?

Design philosophy: Scheduled tasks automatically check system health, pushing alerts through messaging platforms.

Technical implementation:

Hourly system health checks
Daily activity summaries
Weekly statistical reports
Exception pushes to Telegram/Discord/Slack

Strengths:

Simple, direct, usable
Multi-platform support, mobile-accessible
No additional monitoring system required

Limitations:

No explicit quality audit mechanism
No failure reflection system
Non-configurable evaluation standards

How does the self-built Harness implement comprehensive quality control?

Design philosophy: Multi-layer evaluation from system health through task quality to failure reflection, comprehensively covered.

Technical implementation:

1. Heartbeat checks (system-level self-monitoring)
Regular system health verification. Core principle: silent when normal, alerting only when abnormal.

2. TODO consistency checks (preventing state drift)
Verification logic:

Scan TODO.md for all ⏳ (in-progress) status tasks
Search MemOS memories for completion records
If memory shows “completed” but TODO still marks ⏳ → auto-correct + Discord notification

Not updating TODO after completion = manufacturing false information. This is iron law.

3. Strategist review (agent-in-the-loop)
Worker bee outputs don’t publish directly; strategist audits first:

Output includes evidence (screenshots/logs/diffs)?
Meets recipe requirements?
Quality达标?

Unqualified → respawn worker bee. Qualified → experience沉淀, notify next stage.

This is the simplest evaluation loop: agents cannot grade themselves; another agent must evaluate.

4. Reflection system (delayed analysis)

Core concept: delayed analysis rather than real-time reflection.

Specific流程:

Hook records failures: Detected tool call failures, async JSONL log writing, date-sharded storage, automatic parameter desensitization
Cron batch analysis: Early morning reads yesterday’s failure records, aggregates statistics (which tools fail most, which error patterns repeat), LLM analyzes root causes and patterns
Generate report + fix: Structured report generation, automatic fixes where possible, human flagging where not
Experience crystallization: Post-fix experience writes to shared-knowledge

Why delayed analysis?

10x cost reduction: Real-time reflection invokes LLM per failure; batch analysis invokes once daily
Non-blocking main flow: Agents encountering failures during task execution need not pause for reflection
Cross-session pattern detection: Single failures may be incidental, but the same tool failing at the same time for three consecutive days indicates systemic issues

Operational example: Our crawler bee began failing on a specific news site in late February. Real-time reflection might have attributed this to temporary network issues. Delayed analysis revealed the pattern: failures occurred only during 9-11 AM, correlating with the site’s anti-scraping maintenance window. The crystallized skill now schedules crawler tasks outside this window, eliminating a class of failures permanently.

Part 5: Cost Analysis and Decision Framework

Summary: Hermes offers 10-minute setup with $10 - 50 m o n t h l y A P I cos t s, i d e a l f or p erso na l u se u n d er 3 m o n t h s . S e l f - b u i lt H a r n essre q u i res 1 - 2 w ee k sse t u pw i t h$ 200-300 monthly costs for 4-agent teams, but ROI turns positive after 6 months through accumulated trajectory value.

What does each approach actually cost to implement and run?

Hermes-Agent costs:

Time cost: Installation 10 minutes, configuration 30 minutes, proficiency 1 hour
Development cost: Nearly zero (unless writing custom plugins)
Maintenance cost: Low, primarily version updates, occasional MEMORY.md cleanup
API cost: Personal users typically $10-50 monthly

Self-built Harness costs:

Time cost: Infrastructure setup 1-2 weeks, debugging optimization 1-2 weeks, proficiency 2-4 weeks
Development cost: Medium, requiring hook plugins, hybrid search, visualization interface, evaluation mechanisms
Maintenance cost: Medium, requiring system health monitoring, retrieval algorithm optimization, database management, skill threshold updates
API cost: 4-agent team approximately $200-300 monthly (LLM calls + vector generation + skill evaluation)

When does each approach make financial sense?

Usage duration < 3 months: Hermes more cost-effective (low development cost, quick start)
Usage duration > 6 months: Self-built potentially more cost-effective (trajectory accumulation value exceeds development cost)

Critical question: Is your agent a “one-time project” or “long-term companion”?

Which scenario fits which solution?

Scenario 1: Personal AI Assistant
You need an AI handling daily tasks—answering questions, coding, scheduling, searching—that remembers your preferences without repeated explanation. You want minimal infrastructure.
→ Choose Hermes-Agent
Rationale: Ready out-of-box, sufficient memory (3.5KB adequate for individuals), multi-platform support, community skill ecosystem.

Scenario 2: Multi-Agent Long-Term Collaboration
You have 3-5 agents forming a team, collaborating long-term on complex tasks. Each has defined responsibilities, requiring knowledge sharing with privacy isolation. You want continuous evolution with increasing experience accumulation.
→ Self-built Harness
Rationale: Unbounded memory, permission isolation, 100% trajectory capture, visual management.

Scenario 3: Standardized Pipeline Production
You need large-scale standardized content production—hundreds of articles daily, thousands of tweets. Fixed processes, clear quality requirements, high throughput needed.
→ OMO mode (predefined roles + routing distribution)
Rationale: Rigorous processes, stable output, high throughput routing. Neither Hermes nor self-built Harness suits “mass standardized production”; both fit “flexible adaptation and continuous evolution.”

Scenario 4: One-Time Problem Solving
You simply need rapid problem resolution—data analysis, report writing, decision support. No long-term memory or continuous evolution needed.
→ Agent Teams (like Claude Code’s debate room mode)
Rationale: Fast, multi-perspective, no setup required. No Harness needed because no “continuous accumulation” requirement exists.

Implementation Roadmap: From Zero to Operational Harness

If you choose the self-built path, phase implementation prevents overwhelming complexity:

Phase 1: MemOS Foundation (Weeks 1-2)

PostgreSQL + pgvector (or SQLite + FTS5 for lighter weight)
Implement before_agent_start hook (memory injection at launch)
Implement agent_end hook (memory capture at completion)
Simple deduplication (hash-based exact deduplication sufficient)
Goal: Agents remember conversations and recall them next launch.

Phase 2: Role Definition and Recipe Library (Weeks 3-4)

Define resident agent count and responsibilities
Define required bee types and templates
Document AGENTS.md, recipes/, bees/ specifications
Goal: Agents know who they are, what they do, and how to do it.

Phase 3: Evaluation Mechanisms (Weeks 5-6)

Heartbeat checks (cron-scheduled health verification)
Agent mutual evaluation (core agent audits post-bee completion)
TODO consistency checks
Goal: System self-checks, outputs pass quality gates.

Phase 4: Advanced Features (Month 2+)

Automatic skill generation (experience extraction from task trajectories)
Visualization interface (memory browsing, skill management)
Reflection system (learning from failures)

Critical principle: Make it run first, then optimize. Don’t pursue perfection initially—that guarantees never launching.

Action Checklist / Implementation Steps

For Hermes-Agent (Immediate Start):

[ ] Install Hermes-Agent (10 minutes)
[ ] Configure MEMORY.md and USER.md templates
[ ] Connect preferred messaging platform (Telegram/Discord/Slack)
[ ] Install relevant skills from Skills Hub
[ ] Begin daily use, monitoring memory.md growth weekly

For Self-Built Harness (Planned Deployment):

[ ] Provision PostgreSQL 17 + pgvector extension
[ ] Deploy OpenClaw framework with hook support
[ ] Implement before_agent_start memory retrieval hook
[ ] Implement agent_end memory capture with 3-layer deduplication
[ ] Configure hybrid search (BM25 + vector + RRF)
[ ] Define resident agent architecture and responsibilities
[ ] Create shared-knowledge directory structure with AGENTS.md
[ ] Build worker bee templates with DNA injection
[ ] Establish strategist review workflow
[ ] Deploy heartbeat and TODO consistency monitoring
[ ] Implement delayed failure analysis pipeline
[ ] Build web visualization interface (memory browser, skill trees)

One-Page Overview

The Problem: AI agents suffer from amnesia because context windows are volatile. Every session restart erases accumulated expertise, forcing repetitive explanations and preventing skill accumulation.

Two Solutions:

Aspect	Hermes-Agent	Self-Built OpenClaw Harness
Philosophy	Human-brain analogy: bounded working memory, active agent management	External-storage analogy: unbounded capacity, system-enforced governance
Memory	3-tier (2.2KB hot + 8+ providers + FTS5 search), ~60% capture	PostgreSQL + pgvector, hybrid semantic search, 100% hook-based capture
Architecture	Primary + parallel sub-agents, memory inheritance	Multiple residents + worker bees, DNA-level sharing, deathbed feedback
Evaluation	Cron scheduling + messaging alerts	Heartbeat + TODO consistency + strategist review + delayed failure analysis
Setup Time	1 hour	2-4 weeks
Monthly Cost	$10-50	$200-300 (4-agent team)
Best For	Personal assistants, quick start, <3 month projects	Team collaboration, long-term accumulation, >6 month commitments

The Core Insight: Competitive advantage no longer lies in crafting better prompts, but in building Harnesses that capture and compound agent trajectories. Every success and failure becomes training data for the next generation.

The Recommendation: Start with Hermes for immediate value. Migrate to self-built Harness when trajectory accumulation becomes your primary asset.

Frequently Asked Questions

Q: My project lasts only two months. Which solution?
A: Choose Hermes-Agent. Self-built Harness requires 3-6 months for trajectory accumulation value to amortize setup costs. Short projects benefit from Hermes’s quick-start advantage.

Q: What exactly is “hybrid search” and how does it differ from ordinary search?
A: Hybrid search combines BM25 keyword matching with vector semantic search. Ordinary keyword search finds only exact matches (“Python error handling”). Hybrid search understands that “exception catching” and “try-except” are semantically related. RRF (Reciprocal Rank Fusion) merges both result sets by relevance.

Q: What are “hooks” and why do they achieve 100% capture rates?
A: Hooks are event-driven interceptors. before_agent_start automatically injects memories; agent_end automatically captures them. Unlike traditional approaches depending on agents voluntarily calling memory tools, hooks are system-enforced infrastructure—agents have no opportunity to “forget.”

Q: What’s the difference between skills and memories?
A: Memories are raw conversation trajectories (“what I did”). Skills are distilled experience rules (“what to do next time”). In Hermes, agents actively create skills. In the custom system, skills are automatically extracted from successful tasks, quality-scored by LLM (≥6/10), and relationship-mapped to their originating tasks.

Q: Does the 8-second first retrieval delay in the custom system affect usability?
A: The delay loads vector models (e.g., text2vec-base-chinese). Pre-loading or keeping models resident in memory eliminates this. Subsequent retrievals are <100ms, not affecting real-time conversation.

Q: I’m an independent developer needing multi-agent collaboration on limited budget. What options?
A: Use the lightweight variant: SQLite + FTS5 instead of PostgreSQL + pgvector, omitting vector search for keyword-only search with lower latency and no GPU requirement. Retain hook mechanisms for 100% capture—this is the core value investment.

Q: What is Hermes’s “frozen snapshot” and how much does it save?
A: Frozen Snapshot protects prefix cache, avoiding repeated system prompt computation. For long conversations, this saves 20-40% token consumption. Auxiliary copilot model routing (using cheaper models for simple tasks) provides additional savings.

Q: What is “worker bee feedback” and why does it suit long-term accumulation better than sub-agents?
A: Worker bees are temporary specialist agents that auto-destruct after task completion, but mandatorily capture experience to shared knowledge via hooks before destruction. Hermes sub-agents are also temporary, but experience capture depends on agent volition, and they share the primary’s memory without permission isolation. The bee system better supports team collaboration and knowledge沉淀.

Q: My five-person team uses one agent system. How do we prevent mutual interference?
A: The custom Harness’s scope mechanism solves this. Private-scope memories are owner-visible only; shared-scope memories are team-visible. Strategist deliberations (“considering vendor X”) remain private; technical specifications are shared—achieving zero-copy sharing without privacy leakage.

Q: Which solution handles skill duplication better?
A: The custom system’s 4-step generation (rule filtering → hybrid retrieval → LLM confidence decision → quality scoring) with 0.7/0.3 confidence thresholds and task_skills relationship tracking prevents duplication more effectively than Hermes’s background review process, which community feedback identifies as occasionally producing duplicate skills.