The Four-Shrimp Array: A 3-Day Journey from Chatbots to a Productivity System
Have you ever imagined how multiple AI assistants could work together like a team, automatically handling everything from task breakdown and content creation to code writing? This article provides a detailed breakdown of how an AI Agent system called the “Four-Shrimp Array” evolved from a concept into a runnable system over just three days, sharing the key steps, challenges encountered, and valuable lessons learned.
What is the Four-Shrimp Array System?
The Four-Shrimp Array is a collaborative system composed of four AI Agents, each with a specialized role, working together to complete complex tasks. The core goal is to automate the previously manual coordination of AI collaboration, forming an efficient productivity system.
System Architecture Overview
The Four-Shrimp Array uses a master-slave architecture, consisting of a coordinator (the “Lobster”) and three specialized Agents (“Coder,” “Writer,” and “Strategist”).
| Agent Name | Model | Responsibility | Cost Strategy |
|---|---|---|---|
| Lobster (Coordinator) | Claude Opus 4-6 | Task scheduling and coordination | High-cost model |
| Coder | GPT-5.3-Codex | Full-stack coding and development | Free tier |
| Writer | Claude Opus 4-6 | Content creation and writing | High-cost model |
| Strategist | GPT-5.4 | Strategic analysis and review | Free tier |
The system runs on a single VPS and communicates via the OpenClaw Gateway. This design optimizes costs—expensive models are used only for coordination and content creation, while coding and analysis utilize free tiers.
Day 1: Can Talk, But Not Reliable
On the first day after setup, the four Agents could already receive tasks, execute work, and return results. However, the system had several critical flaws.
Key Problems Encountered
-
Lack of Behavioral Constraints
-
The Writer’s quality was inconsistent when generating tweets. -
The Strategist’s reviews varied in thoroughness, sometimes verbose, sometimes too brief. -
The Coder’s exception handling was arbitrary and lacked consistency.
-
-
No Collaboration Standards
-
Who should send messages to whom? -
Who should be notified upon task completion? -
What happens if a task times out? -
All of these relied on the Agents’ own reasoning and guessing.
-
-
Missing Task Tracking
-
Once a task was dispatched, it felt like it disappeared into a black hole. -
The Coordinator had no visibility into the Coder’s progress. -
There was no way to confirm if the Writer’s draft had been reviewed by the Strategist.
-
Day 1 Achievements
Despite the problems, Day 1 still validated the system’s basic capabilities:
-
The Writer drafted a 2,800-word article for the X platform. -
The Strategist reviewed the article and gave a score of 8.2/10. -
Based on the Strategist’s feedback, the Writer produced a second version. -
The Coder used Claude Code to build a simple accounting CLI tool.
Conclusion: The basic capabilities were in place, but the system lacked the management level.
Day 2: Equipping Each “Shrimp” with a Harness
On the coordination and standards at second day, the development team studied six methodology documents from the revfactory/harness project. The core findings included:
Key Principles Learned
-
Explicit Communication Protocols: Each Agent must clearly define “who to receive messages from, who to send to, and what to do upon completion.” -
Error Handling Mechanisms: Timeouts, failures, and unclear requirements cannot be decided by the Agents themselves. -
Progressive Disclosure: Do not cram all rules into one file; load them as needed. -
Assertion-Based Review: Instead of an open-ended “What do you think?”, use a checklist of “passed/failed” items.
System Improvements
1. Rewritten SOUL.md Files
All four Agents’ SOUL.md files were completely rewritten with explicit behavioral guidelines:
-
Lobster (Coordinator): Added an explicit routing table:
-
Writing, copy, tweets, articles → Writer -
Code, tools, API, bugs → Coder -
Review, analysis, evaluation, strategy → Strategist
This eliminated the need for reasoning and guessing; tasks are assigned by direct lookup.
-
-
Strategist: Added an assertion-based review format with 7 checklist items:
-
Hook Strength → Pass/Fail -
Structure & Rhythm → Pass/Fail -
Data Support → Pass/Fail -
Terminology Consistency → Pass/Fail -
Conclusion Strength → Pass/Fail -
Factual Accuracy → Pass/Fail -
Platform Adaptation → Pass/Fail
-
-
Writer: Implemented Progressive Disclosure by splitting platform-specific rules into separate files:
workspace-writer/references/ ├── x-platform.md # X Platform Rules ├── wechat-platform.md # WeChat Rules └── rednote-platform.md # RedNote (Xiaohongshu) RulesWhen writing an X tweet, only
x-platform.mdis loaded, saving ~2/3 of the token cost.
2. Toolchain Installation
-
Codex CLI v0.117.0 -
Codex Plugin for Claude Code ( /codex:review+/codex:rescue) -
The Coder used Codex to continuously deliver three real tools: an accounting CLI, a to-do CLI, and a web page title scraper.
Day 2’s Takeaway: The system evolved from “can work” to “can work methodically.”
Day 3: From “Working Methodically” to “Working Systematically”
Day 3 saw the most significant changes, with the team pushing six versions in a single day, completing a qualitative leap in the system.
Version Iteration Process
v1: Shared Task Board
Adopted file-based task management using board.json + queue.md:
-
No database, no external services needed. -
A single JSON file acts as the master task table. -
A Markdown file provides a human-readable view.
v2: Executor CLI
Developed task_board.py, a command-line tool for creating, updating, deleting, and querying tasks:
python3 task_board.py create --title "Write Tweet" --owner writer
python3 task_board.py update task-001 --status done
python3 task_board.py list
python3 task_board.py check-overdue --mark
v3: Scheduler Wrapper
Developed lobster_ops.py. The Lobster automatically creates a task upon dispatch and binds long-running tasks to a runtime:
python3 lobster_ops.py dispatch \
--title "Write a Four-Shrimp Array Tweet" \
--brief "X platform, Jason AI Overseas style" \
--agent writer
v3.1: True OpenClaw Process Session Binding
Bound OpenClaw’s native exec background session directly to the task:
-
Dispatch → Exec Background → Bind Process → Finalize Runtime → Done -
The task now records the sessionId,workdir, command, and status directly. Instead of “I think the Coder finished,” theboard.jsonexplicitly states “session nimble-cove has finished.”
v3.1.1: Automatic Finalization (The Key Step)
Previous Chain: Run → Manually call finalize-runtime → Update status.
New Chain: Generate a wrapper command that automatically calls finalize-runtime upon completion. A task is marked done on success or blocked on failure—no manual intervention needed.
v3.1.2: ACP Run Mode
Integrated external coding Agents like Codex:
-
Dispatch → sessions_spawn(runtime=acp, agentId=codex, mode=run)→ Bind ACP → Finalize → Done -
Lesson Learned: A critical pitfall was confusing agentId(the ACP harness’scodex) with the Four-Shrimp Array’sCoder. Theowneris a business role, while theruntime agentis the execution engine—they are two distinct concepts.
Final: Control Center
Installed openclaw-control-center, a web dashboard:
-
One-way sync: board.json→ Control Center. -
board.jsonis the single source of truth; the Web UI is a read-only mirror. -
Automatic pushes occur on every dispatchandfinalize. Sync failures do not block the main workflow.
Day 3’s Takeaway: The system evolved from “working methodically” to “working systematically.”
Three-Day Evolution Summary
| Day | System State | Key Output |
|---|---|---|
| 1 | Can Talk | 4-Agent communication, Write→Review→Revise workflow, 3 tools |
| 2 | Methodical | Harness methodology, SOUL.md upgrades, Codex toolchain |
| 3 | Systematic | Task board v6, 4 execution pipelines, Web control center |
The 4 Validated Execution Pipelines
-
Pipeline A: tmux Long-Running Task
dispatch→tmux→tail-log→finalize→done -
Pipeline B: OpenClaw Background Process
dispatch→exec background→bind-process→finalize→done -
Pipeline C: Automatic Finalization
dispatch→render-process-wrapper→exec background→ auto-finalize→done -
Pipeline D: ACP Run (External Engines)
dispatch→sessions_spawn(codex, mode=run)→bind-acp→finalize→done
Lessons from Claude Code Source Code: Next-Phase Planning
By analyzing the Claude Code CLI source code (~1,884 TypeScript files), several valuable designs were identified for adoption:
1. TaskTool for Multi-Agent Collaboration
Claude Code has a TaskTool specifically for task decomposition and parallel execution. Inspiration for Four-Shrimp Array: Our current workflow is serial (Writer → Strategist → Writer). The next step is to support parallel Fan-out, where the Lobster decomposes a task, and Coder and Writer work simultaneously, then consolidate results.
2. /compact for Context Compression
Claude Code has a built-in /compact command to automatically compress dialogue context. Inspiration: In long tasks, an Agent’s context grows larger, consuming more tokens. Context management should be handled at the task board level—automatically compacting after task completion, retaining only key conclusions and output paths.
3. /review + /commit Code Loop
In Claude Code, code review and submission are integrated: review leads directly to commit. Inspiration: After the Coder writes code, it should automatically undergo a review (by the Strategist or via Codex’s /codex:adversarial-review). Once approved, it commits directly, creating a “write-review-commit” end-to-end flow.
4. MCP Protocol
Claude Code fully implements the Model Context Protocol (Stdio + SSE), supporting tool calls, resource management, and external service integration. Inspiration: Currently, the Four-Shrimp Array’s tools rely solely on OpenClaw’s built-ins. If MCP is integrated, Agents could directly call external services (e.g., Feishu API, GitHub API, database queries) without writing wrapper scripts.
5. Plugin System
Claude Code’s /plugin system supports hot-swappable skills. Inspiration: The Writer’s platform rules are currently reference files. If converted into skill plugins, writing a RedNote post could automatically load the RedNote skill, including templates, examples, and sensitive word filtering—far more powerful than file references.
Next-Phase Roadmap
Phase 1: Automation Loop (1-2 Weeks)
-
Full Write→Review→Revise workflow via task board, no manual sessions_send. -
ACP Run automatic finalization (similar to process wrapper). -
Complete Codex certification; let Coder truly code via Codex. -
Implement heartbeat checks (regularly check email, calendar, task status).
Phase 2: Efficiency Boost (2-4 Weeks)
-
Parallel Fan-out: Decompose complex tasks for simultaneous multi-Agent execution. -
Model Hot-Switching: Use free models for simple tasks, automatically switch to Opus for complex ones. -
Review Result Persistence: Support cross-task comparison of assertion data. -
Context Management: Automatic compaction for long tasks.
Phase 3: Ecosystem Integration (1-2 Months)
-
Feishu Multi-Dimensional Table sync (dual-write task board to Bitable). -
Cron scheduled tasks (daily stand-ups, weekly retros, timed content production). -
More ACP harnesses (Claude Code, Gemini CLI, OpenCode). -
MCP protocol integration (let Agents call external APIs directly).
Phase 4: Productization (2-3 Months)
-
Open-source Four-Shrimp Array starter kit. -
Cost tracking dashboard (statistics by Agent/task type). -
Quality dashboard (Strategist score trends, revision rounds, pass rates). -
Agent self-evolution (automatically tune SOUL.md based on historical task data).
FAQ: Frequently Asked Questions
Q1: What scenarios is the Four-Shrimp Array best suited for?
The system is ideal for complex tasks requiring multi-AI collaboration, such as content creation workflows (writing→review→revision), software development workflows (requirement analysis→coding→testing), and market analysis workflows (data collection→analysis→report generation).
Q2: How do I get started with the Four-Shrimp Array?
First, set up the basic environment: a VPS, OpenClaw Gateway, and configure the four Agents’ SOUL.md files. Start with simple tasks for testing and gradually increase complexity.
Q3: How does the system handle task timeouts?
Timeout handling is defined in the SOUL.md. If a task times out, it automatically notifies the Coordinator, who decides whether to retry, reassign, or mark it as failed.
Q4: How can I monitor the system’s status?
The web dashboard provides real-time views of task status, Agent activity, and system performance metrics. All states are synchronized from board.json, ensuring information consistency.
Q5: How is system cost controlled?
Costs are controlled through model allocation strategies: high-cost models for coordination and content creation, free tiers for coding and analysis. Additionally, progressive disclosure reduces unnecessary token consumption.
Value and Future Outlook
The core value of the Four-Shrimp Array lies in elevating AI collaboration from “can chat” to “can work systematically.” Through three days of iteration, the system completed the transformation from concept validation to a runnable system, proving the feasibility of AI Agent collaboration.
Core Value Propositions
-
Reliability First: Making AI more reliable is more important than making it smarter. -
Task Traceability: Every task has a clear status and execution record. -
Cost Control: Intelligent allocation optimizes model usage costs. -
Scalability: Supports adding new Agents and tools.
Future Development Direction
From the maturity of Claude Code, a complete AI coding tool requires ~1,884 files, ~50 commands, and ~30 tools. The Four-Shrimp Array is currently only about 1% of that体量 (scale), but the direction is clear:
The next 3 days, from system to product. Through continuous iteration, the Four-Shrimp Array has the potential to become a truly usable Agent Ops system, allowing more people to benefit from the productivity gains of AI collaboration.
Through three days of iteration, the Four-Shrimp Array system demonstrates the potential and implementation path of AI Agent collaboration. From basic communication to systematic work, each step solved practical problems, providing valuable experience for building reliable AI productivity systems. Whether you are a technical developer or an AI application user, this case study is worth in-depth research and reference.
