AI Agent Teamwork: How the Four-Shrimp Array System Automates Productivity in 3 Days

高效码农

7 hours ago

The Four-Shrimp Array: A 3-Day Journey from Chatbots to a Productivity System

Have you ever imagined how multiple AI assistants could work together like a team, automatically handling everything from task breakdown and content creation to code writing? This article provides a detailed breakdown of how an AI Agent system called the “Four-Shrimp Array” evolved from a concept into a runnable system over just three days, sharing the key steps, challenges encountered, and valuable lessons learned.

What is the Four-Shrimp Array System?

The Four-Shrimp Array is a collaborative system composed of four AI Agents, each with a specialized role, working together to complete complex tasks. The core goal is to automate the previously manual coordination of AI collaboration, forming an efficient productivity system.

System Architecture Overview

The Four-Shrimp Array uses a master-slave architecture, consisting of a coordinator (the “Lobster”) and three specialized Agents (“Coder,” “Writer,” and “Strategist”).

Agent Name	Model	Responsibility	Cost Strategy
Lobster (Coordinator)	Claude Opus 4-6	Task scheduling and coordination	High-cost model
Coder	GPT-5.3-Codex	Full-stack coding and development	Free tier
Writer	Claude Opus 4-6	Content creation and writing	High-cost model
Strategist	GPT-5.4	Strategic analysis and review	Free tier

The system runs on a single VPS and communicates via the OpenClaw Gateway. This design optimizes costs—expensive models are used only for coordination and content creation, while coding and analysis utilize free tiers.

Day 1: Can Talk, But Not Reliable

On the first day after setup, the four Agents could already receive tasks, execute work, and return results. However, the system had several critical flaws.

Key Problems Encountered

Lack of Behavioral Constraints
- The Writer’s quality was inconsistent when generating tweets.
- The Strategist’s reviews varied in thoroughness, sometimes verbose, sometimes too brief.
- The Coder’s exception handling was arbitrary and lacked consistency.
No Collaboration Standards
- Who should send messages to whom?
- Who should be notified upon task completion?
- What happens if a task times out?
- All of these relied on the Agents’ own reasoning and guessing.
Missing Task Tracking
- Once a task was dispatched, it felt like it disappeared into a black hole.
- The Coordinator had no visibility into the Coder’s progress.
- There was no way to confirm if the Writer’s draft had been reviewed by the Strategist.

Day 1 Achievements

Despite the problems, Day 1 still validated the system’s basic capabilities:

The Writer drafted a 2,800-word article for the X platform.
The Strategist reviewed the article and gave a score of 8.2/10.
Based on the Strategist’s feedback, the Writer produced a second version.
The Coder used Claude Code to build a simple accounting CLI tool.

Conclusion: The basic capabilities were in place, but the system lacked the management level.

Day 2: Equipping Each “Shrimp” with a Harness

On the coordination and standards at second day, the development team studied six methodology documents from the revfactory/harness project. The core findings included:

Key Principles Learned

Explicit Communication Protocols: Each Agent must clearly define “who to receive messages from, who to send to, and what to do upon completion.”
Error Handling Mechanisms: Timeouts, failures, and unclear requirements cannot be decided by the Agents themselves.
Progressive Disclosure: Do not cram all rules into one file; load them as needed.
Assertion-Based Review: Instead of an open-ended “What do you think?”, use a checklist of “passed/failed” items.

System Improvements

1. Rewritten SOUL.md Files

All four Agents’ SOUL.md files were completely rewritten with explicit behavioral guidelines:

Lobster (Coordinator): Added an explicit routing table:
- Writing, copy, tweets, articles → Writer
- Code, tools, API, bugs → Coder
- Review, analysis, evaluation, strategy → Strategist
  This eliminated the need for reasoning and guessing; tasks are assigned by direct lookup.
Strategist: Added an assertion-based review format with 7 checklist items:
1. Hook Strength → Pass/Fail
2. Structure & Rhythm → Pass/Fail
3. Data Support → Pass/Fail
4. Terminology Consistency → Pass/Fail
5. Conclusion Strength → Pass/Fail
6. Factual Accuracy → Pass/Fail
7. Platform Adaptation → Pass/Fail

Writer: Implemented Progressive Disclosure by splitting platform-specific rules into separate files:

workspace-writer/references/
├── x-platform.md       # X Platform Rules
├── wechat-platform.md  # WeChat Rules
└── rednote-platform.md # RedNote (Xiaohongshu) Rules

When writing an X tweet, only x-platform.md is loaded, saving ~2/3 of the token cost.

2. Toolchain Installation

Codex CLI v0.117.0
Codex Plugin for Claude Code (/codex:review + /codex:rescue)
The Coder used Codex to continuously deliver three real tools: an accounting CLI, a to-do CLI, and a web page title scraper.

Day 2’s Takeaway: The system evolved from “can work” to “can work methodically.”

Day 3: From “Working Methodically” to “Working Systematically”

Day 3 saw the most significant changes, with the team pushing six versions in a single day, completing a qualitative leap in the system.

Version Iteration Process

v1: Shared Task Board

Adopted file-based task management using board.json + queue.md:

No database, no external services needed.
A single JSON file acts as the master task table.
A Markdown file provides a human-readable view.

v2: Executor CLI

Developed task_board.py, a command-line tool for creating, updating, deleting, and querying tasks:

python3 task_board.py create --title "Write Tweet" --owner writer
python3 task_board.py update task-001 --status done
python3 task_board.py list
python3 task_board.py check-overdue --mark

v3: Scheduler Wrapper

Developed lobster_ops.py. The Lobster automatically creates a task upon dispatch and binds long-running tasks to a runtime:

python3 lobster_ops.py dispatch \
  --title "Write a Four-Shrimp Array Tweet" \
  --brief "X platform, Jason AI Overseas style" \
  --agent writer

v3.1: True OpenClaw Process Session Binding

Bound OpenClaw’s native exec background session directly to the task:

Dispatch → Exec Background → Bind Process → Finalize Runtime → Done
The task now records the sessionId, workdir, command, and status directly. Instead of “I think the Coder finished,” the board.json explicitly states “session nimble-cove has finished.”

v3.1.1: Automatic Finalization (The Key Step)

Previous Chain: Run → Manually call finalize-runtime → Update status.
New Chain: Generate a wrapper command that automatically calls finalize-runtime upon completion. A task is marked done on success or blocked on failure—no manual intervention needed.

v3.1.2: ACP Run Mode

Integrated external coding Agents like Codex:

Dispatch → sessions_spawn(runtime=acp, agentId=codex, mode=run) → Bind ACP → Finalize → Done
Lesson Learned: A critical pitfall was confusing agentId (the ACP harness’s codex) with the Four-Shrimp Array’s Coder. The owner is a business role, while the runtime agent is the execution engine—they are two distinct concepts.

Final: Control Center

Installed openclaw-control-center, a web dashboard:

One-way sync: board.json → Control Center.
board.json is the single source of truth; the Web UI is a read-only mirror.
Automatic pushes occur on every dispatch and finalize. Sync failures do not block the main workflow.

Day 3’s Takeaway: The system evolved from “working methodically” to “working systematically.”

Three-Day Evolution Summary

Day	System State	Key Output
1	Can Talk	4-Agent communication, Write→Review→Revise workflow, 3 tools
2	Methodical	Harness methodology, SOUL.md upgrades, Codex toolchain
3	Systematic	Task board v6, 4 execution pipelines, Web control center

The 4 Validated Execution Pipelines

Pipeline A: tmux Long-Running Task
dispatch → tmux → tail-log → finalize → done
Pipeline B: OpenClaw Background Process
dispatch → exec background → bind-process → finalize → done
Pipeline C: Automatic Finalization
dispatch → render-process-wrapper → exec background → auto-finalize → done
Pipeline D: ACP Run (External Engines)
dispatch → sessions_spawn(codex, mode=run) → bind-acp → finalize → done

Lessons from Claude Code Source Code: Next-Phase Planning

By analyzing the Claude Code CLI source code (~1,884 TypeScript files), several valuable designs were identified for adoption:

1. TaskTool for Multi-Agent Collaboration

Claude Code has a TaskTool specifically for task decomposition and parallel execution. Inspiration for Four-Shrimp Array: Our current workflow is serial (Writer → Strategist → Writer). The next step is to support parallel Fan-out, where the Lobster decomposes a task, and Coder and Writer work simultaneously, then consolidate results.

2. /compact for Context Compression

Claude Code has a built-in /compact command to automatically compress dialogue context. Inspiration: In long tasks, an Agent’s context grows larger, consuming more tokens. Context management should be handled at the task board level—automatically compacting after task completion, retaining only key conclusions and output paths.

3. /review + /commit Code Loop

In Claude Code, code review and submission are integrated: review leads directly to commit. Inspiration: After the Coder writes code, it should automatically undergo a review (by the Strategist or via Codex’s /codex:adversarial-review). Once approved, it commits directly, creating a “write-review-commit” end-to-end flow.

4. MCP Protocol

Claude Code fully implements the Model Context Protocol (Stdio + SSE), supporting tool calls, resource management, and external service integration. Inspiration: Currently, the Four-Shrimp Array’s tools rely solely on OpenClaw’s built-ins. If MCP is integrated, Agents could directly call external services (e.g., Feishu API, GitHub API, database queries) without writing wrapper scripts.

5. Plugin System

Claude Code’s /plugin system supports hot-swappable skills. Inspiration: The Writer’s platform rules are currently reference files. If converted into skill plugins, writing a RedNote post could automatically load the RedNote skill, including templates, examples, and sensitive word filtering—far more powerful than file references.

Next-Phase Roadmap

Phase 1: Automation Loop (1-2 Weeks)

Full Write→Review→Revise workflow via task board, no manual sessions_send.
ACP Run automatic finalization (similar to process wrapper).
Complete Codex certification; let Coder truly code via Codex.
Implement heartbeat checks (regularly check email, calendar, task status).

Phase 2: Efficiency Boost (2-4 Weeks)

Parallel Fan-out: Decompose complex tasks for simultaneous multi-Agent execution.
Model Hot-Switching: Use free models for simple tasks, automatically switch to Opus for complex ones.
Review Result Persistence: Support cross-task comparison of assertion data.
Context Management: Automatic compaction for long tasks.

Phase 3: Ecosystem Integration (1-2 Months)

Feishu Multi-Dimensional Table sync (dual-write task board to Bitable).
Cron scheduled tasks (daily stand-ups, weekly retros, timed content production).
More ACP harnesses (Claude Code, Gemini CLI, OpenCode).
MCP protocol integration (let Agents call external APIs directly).

Phase 4: Productization (2-3 Months)

Open-source Four-Shrimp Array starter kit.
Cost tracking dashboard (statistics by Agent/task type).
Quality dashboard (Strategist score trends, revision rounds, pass rates).
Agent self-evolution (automatically tune SOUL.md based on historical task data).

FAQ: Frequently Asked Questions

Q1: What scenarios is the Four-Shrimp Array best suited for?

The system is ideal for complex tasks requiring multi-AI collaboration, such as content creation workflows (writing→review→revision), software development workflows (requirement analysis→coding→testing), and market analysis workflows (data collection→analysis→report generation).

Q2: How do I get started with the Four-Shrimp Array?

First, set up the basic environment: a VPS, OpenClaw Gateway, and configure the four Agents’ SOUL.md files. Start with simple tasks for testing and gradually increase complexity.

Q3: How does the system handle task timeouts?

Timeout handling is defined in the SOUL.md. If a task times out, it automatically notifies the Coordinator, who decides whether to retry, reassign, or mark it as failed.

Q4: How can I monitor the system’s status?

The web dashboard provides real-time views of task status, Agent activity, and system performance metrics. All states are synchronized from board.json, ensuring information consistency.

Q5: How is system cost controlled?

Costs are controlled through model allocation strategies: high-cost models for coordination and content creation, free tiers for coding and analysis. Additionally, progressive disclosure reduces unnecessary token consumption.

Value and Future Outlook

The core value of the Four-Shrimp Array lies in elevating AI collaboration from “can chat” to “can work systematically.” Through three days of iteration, the system completed the transformation from concept validation to a runnable system, proving the feasibility of AI Agent collaboration.

Core Value Propositions

Reliability First: Making AI more reliable is more important than making it smarter.
Task Traceability: Every task has a clear status and execution record.
Cost Control: Intelligent allocation optimizes model usage costs.
Scalability: Supports adding new Agents and tools.

Future Development Direction

From the maturity of Claude Code, a complete AI coding tool requires ~1,884 files, ~50 commands, and ~30 tools. The Four-Shrimp Array is currently only about 1% of that体量 (scale), but the direction is clear:

The next 3 days, from system to product. Through continuous iteration, the Four-Shrimp Array has the potential to become a truly usable Agent Ops system, allowing more people to benefit from the productivity gains of AI collaboration.

Through three days of iteration, the Four-Shrimp Array system demonstrates the potential and implementation path of AI Agent collaboration. From basic communication to systematic work, each step solved practical problems, providing valuable experience for building reliable AI productivity systems. Whether you are a technical developer or an AI application user, this case study is worth in-depth research and reference.