ROMA Explained: A Recursive Meta-Agent Framework That Turns Task Decomposition into Plug-and-Play
TL;DR: ROMA gives you a six-line recursion pattern—Atomizer, Planner, Executor, Aggregator—and a ready-to-run repo that converts any LLM, API, or custom code into a hierarchical agent. Clone,
./setup.sh
, and you have a visual IDE in under a minute; write three lines of Python and your first agent is live five minutes later.
What Exactly Is ROMA and Why Should I Care?
Core question answered: “What is ROMA in one sentence, and why is it different from the dozens of agent frameworks already on GitHub?”
ROMA is a meta-agent backbone that treats every task as either atomic or recursively splittable. It ships with (a) a four-function recursion loop, (b) Docker/Native bootstrappers that work on macOS, Ubuntu, and Debian, and (c) three reference agents—general search, deep research, crypto analytics—that you can fork in under five minutes. The entire value proposition is structural: you get dependency-aware, parallel execution without writing DAG YAML or managing queues.
Author’s reflection: The first time I read the six-line pseudo-code I thought “That’s just map-reduce with an if
statement,” but the subtle part is dynamic splitting—agents decide at run-time whether to keep drilling down, which eliminates acres of hand-written glue code I used to maintain in older orchestration projects.
How the Recursion Loop Works (and Why It Stays Understandable)
Core question answered: “How does a ‘plan-execute-aggregate’ loop stay transparent when tasks can nest ten levels deep?”
2.1 The Four Roles in Plain English
Function | One-Line Job | Lives In |
---|---|---|
Atomizer | “Is this small enough to run directly?” | Planner class |
Planner | “If not, return a list of sub-tasks.” | Planner class |
Executor | “Run the atomic unit and return raw output.” | Any object with .execute() |
Aggregator | “Turn children outputs into the parent answer.” | Planner class |
2.2 Information Flow
-
Top-down: parent task → subtasks (recursive call). -
Bottom-up: child results → Aggregator → parent answer. -
Left-to-right: if task n depends on task n-1, framework blocks it until upstream finishes.
Because every agent exposes the same .execute()
signature, you can nest LLM calls, REST APIs, bash scripts, or even remote Jenkins jobs without extra adapters.
Scenario walk-through: Imagine asking the system to “Compare 2025 road-maps for Ethereum L2s vs Solana L2s.”
-
Atomizer says “too vague.” -
Planner spawns three siblings: fetch_latest_l2_data
,find_fee_papers
,predict_token_price
. -
Framework notices that predict_token_price
needs the data sibling; it schedules the first two in parallel, then the third. -
Aggregator receives three JSON blobs and prompts an LLM to produce the final one-page memo.
The visual front-end shows the tree live, so you can click any node to see prompts, raw responses, and timing.
30-Second Install: From Git Clone to Browser Tab
Core question answered: “I have Docker and a spare laptop core—what is the absolute fastest path to a running system?”
git clone https://github.com/sentient-agi/ROMA.git
cd ROMA
./setup.sh # interactive prompt, choose Docker
The script:
-
pulls Python 3.12 + Node 20 images; -
spins up FastAPI on localhost:5000
, React onlocalhost:3000
; -
optionally installs goofys and mounts an S3 bucket if AWS_ACCESS_KEY_ID
is present; -
prints “Frontend ready → http://localhost:3000” when done.
Open the URL and you should see three demo agents plus a “Create Agent” button. Total hands-on time: well under a minute on a 2020+ laptop with pre-installed Docker.
Five-Minute “Hello Agent” Example
Core question answered: “How do I actually write my own agent without reading the entire codebase?”
-
In the UI click New Agent → Blank Template. -
Paste the snippet below, hit Save, then Run.
from sentientresearchagent import SentientAgent
agent = SentientAgent.create()
result = await agent.run("Explain ROMA to a 10-year-old in 200 words")
-
Watch the live tree: Atomizer deems the request non-atomic → Planner splits into “generate kid-friendly analogy,” “add safety note,” “keep under 200 words” → Aggregator concatenates → final text appears in the right pane.
You have just (a) invoked the recursion loop, (b) used the default ChatGPT executor, and (c) observed tracing—all without configuration files.
Inside the Three Reference Agents: Copy-Paste Material
Core question answered: “I need a search/research/crypto specialist agent today—can I cargo-cult one?”
Agent | When to Use | Key Integrations | How to Invoke |
---|---|---|---|
General Task Solver | Quick questions that need live web answers | OpenAI Search Preview | Pick “General” tile in UI |
Deep Research Agent | Multi-hour literature or competitive intel | Wikipedia, Arxiv, SERP APIs | Pick “Deep Research” tile |
Crypto Analytics Agent | Token due-diligence, on-chain metrics | Binance, CoinGecko, DeFiLlama, Arkham Intel | Pick “Crypto” tile (requires E2B_API_KEY ) |
All three expose the same .run()
interface, so you can swap executors or add data sources by editing a single YAML file—no need to touch the recursion logic.
Scenario snapshot: A quant friend needed hourly TVL charts. Instead of writing another Python scheduler, he duplicated the Crypto tile, replaced the default prompt with his technical-indicator template, and ticked the “cron hourly” box in the UI. Elapsed time: 12 minutes; lines of new code: 23.
Deep Customization: Four Hooks You Actually Need
Core question answered: “Where do I have to write code if the reference agents are 90 % but not 100 % of what I want?”
-
agent.is_atomic()
– define “small enough.”
Example: In a medical-summary agent we declared any task with <300 tokens and no drug names as atomic to avoid over-splitting. -
agent.plan()
– returnList[SubTask]
.
Example: For patent landscaping we injected a domain rule: always spawn “search USPTO,” “search EPO,” and “generate claim chart.” -
agent.execute()
– any callable with.execute(task) → str
.
Example: We wrapped a legacy MATLAB script; stdout was captured and returned as a string—no refactoring needed. -
agent.aggregate()
– convert children results into parent answer.
Example: Instead of naive concatenation, we prompted an LLM to produce a “tweet thread” style summary and appended source links.
Override only what you need; the remaining base-class behavior stays intact, so upgrades remain a simple git pull
.
Tech Stack in One Breath
-
Core framework: AgnoAgents (Python 3.12+) -
API layer: FastAPI (sync + async) -
Front-end: React + TypeScript, WebSocket push for live tracing -
LLM routing: LiteLLM (swap providers without code change) -
Secure code exec: E2B sandboxes; S3 sync via goofys FUSE -
Data validation: Pydantic models for every message type -
Extensibility: MCP (Model Context Protocol) hooks, caching layer, multi-modal ready
Benchmarks: What the Numbers Actually Say
Core question answered: “Does the recursive trick actually beat single-shot prompts on public data sets?”
The team ran a minimal search-oriented configuration nicknamed ROMA-Search against three benchmarks:
-
SEAL-0 – adversarial search with conflicting snippets -
FRAMES – multi-hop retrieval + reasoning -
SimpleQA – short fact-seeking questions
Across the board ROMA-Search outperformed the baseline gpt-4-search-preview by 6–12 % on the “noisy context” and “multi-hop” slices. The delta came only from structure (recursive re-planning + parallel aggregation), not a bigger model.
Author’s reflection: Benchmark debates can feel like marketing, but here the same LLM endpoint was used in both pipelines; the only variable was ROMA’s split-then-aggregate loop. That convinces me the lift is real, not smoke.
Common Pitfalls & One-Line Fixes
Symptom | Likely Cause | Fast Fix |
---|---|---|
./setup.sh fails on “goofys not found” |
missing fuse | Ubuntu: sudo apt install fuse then re-run |
React UI blank white page | backend not ready | docker logs roma-backend check AWS keys |
E2B sandbox ImportError | image lacks package | add lib to e2b_template/requirements.txt , rerun ./setup.sh --e2b |
Recursion depth overflow | task keeps splitting | override is_atomic() with a hard token floor |
Aggregator hallucinates | context window saturated | switch on “summary children first” flag in planner YAML |
Action Checklist / Implementation Steps
-
Clone repo → ./setup.sh
→ openlocalhost:3000
(✓ <1 min) -
Run pre-built “General” agent with your own query to confirm plumbing (✓ +2 min) -
Duplicate “Blank Template,” paste three-liner, hit Save → Run (✓ +5 min) -
Optional: plug in your API keys (OpenAI, E2B, AWS) via .env
-
Optional: override any of the four hooks; commit to your own branch for easy git pull
later -
Ship: embed the agent behind your REST endpoint by importing SentientAgent
in any Python service
One-Page Overview
-
Concept: four-function recursion loop = Atomizer → Planner → Executor → Aggregator -
Install: ./setup.sh
(Docker or Native), open browser at3000
-
Hello World: three lines of Python, zero config -
Extensibility: swap LLM, add tools, override hooks, keep the loop -
Proof: +6–12 % on noisy search benchmarks without changing base model -
License: MIT, commercial-friendly
FAQ
Q1: Do I have to use Docker?
A: No—run ./setup.sh --native
for a direct venv install.
Q2: Which LLM providers work?
A: Any supported by LiteLLM (OpenAI, Anthropic, Google, Ollama, etc.).
Q3: Can the framework work offline?
A: Yes if your Executor uses local models or scripts; search agents obviously need the web.
Q4: Is there a hard limit on recursion depth?
A: Default max is 25; override in planner config or implement your own is_atomic()
guard.
Q5: How do I add a custom Python package inside E2B sandboxes?
A: List it in e2b_template/requirements.txt
and rerun ./setup.sh --e2b
.
Q6: Are there royalties or usage caps for production?
A: ROMA is MIT-licensed; you only pay your underlying LLM/API providers.
Q7: What if my Aggregator needs more context than fits in an LLM prompt?
A: Enable “summary children first” mode; it compresses intermediate answers before the final call.