Mercury: The AI Agent We All Wanted – Where Control, Permissions, and Autonomy Finally Get Real

Core question this article answers: Among the dozens of AI agent tools out there, is there one that can truly work for you, keep your system safe, and not leave you with a shocking API bill at the end of the month?

When AI Agents Become the Problem

You’ve probably tried those AI agents that promise to write code, manage files, and send emails for you. Exciting at first. Then the issues pile up.

Security is the first landmine. AI agents need to run shell commands, touch your files, and install third‑party skills – like handing a master key to someone you just met. Worse, not every skill deserves trust. In one ecosystem alone, researchers found over 800 malicious skills actively stealing credentials. Even scarier: a single‑click remote code execution flaw (CVSS 8.8) exposed tens of thousands of instances to complete system compromise.

Cost is the silent budget killer. Context windows bloat without you noticing. Every turn, the full conversation history is fed back into the model. You run the agent, cross your fingers, and only find out what it cost at the end of the month.

Identity is a mess. An agent’s personality is either scattered across a dozen directories or buried inside an opaque SQLite blob that you can’t read, version, or edit.

Facing these three problems head‑on, we need something different.

Reflection: I’ve watched too many people start with high hopes for AI agents only to abandon them weeks later – not because the agents weren’t smart, but because they were uncontrollable. The biggest failure of a tech product isn’t missing features; it’s broken trust. The moment you fear your AI assistant might accidentally delete files or burn your budget with context bloat, you stop using it.

Mercury: Built for the Real World

Core question this section answers: What does Mercury actually do differently from everyone else?

Mercury was built precisely for that reality. It is a tool‑first, background‑native orchestrator with:

Paranoia‑level permissions
A token budget that respects your wallet
A four‑file soul system you own in plain text

It is not another chat wrapper pretending to be a brain. It is a reliable worker that asks before it acts.

Before Mercury, two impressive open‑source projects proved what developers wanted:

OpenClaw (a comparable tool) hit 100,000 GitHub stars in weeks, proving developers wanted a local agent that could actually execute shell commands instead of just generating text in a browser tab.
Hermes (another comparable tool) brought persistent SQLite memory and autonomous skill generation.

Both are brilliant feats of engineering. Both also left the same three gaps – security, cost, identity – wide open.

Mercury was built to close those gaps.

1. Permissions That Actually Gate Execution

Core question this section answers: Why is Mercury’s permission model safer than other agents?

Other agents often require uncomfortably broad access to function, relying on an unvetted ecosystem of third‑party extensions. The result is a security nightmare. Beyond the 800+ malicious skills found in the wild, some core architectures also suffered from critical flaws – a CVSS 8.8 RCE vulnerability that exposed over 40,000 instances to total system compromise via a single clicked link, completely bypassing localhost protections.

Mercury’s stance is simple: you should never blindly trust an LLM with root access.

Its architecture is permission‑hardened by default:

Read and write access is explicitly scoped to specific folders. The agent cannot rummage through your entire hard drive.
Destructive commands like sudo or rm -rf / are hard‑blocked at the execution layer. They do not even trigger a “prompt for approval” – they simply never execute.
Third‑party skills only receive elevated access through explicitly defined granular tools, not blanket permissions.

Real‑world scenario: You ask Mercury to clean up your Downloads folder. A naive agent might run rm -rf ~/Downloads/* and you’d regret it later. Mercury instead checks whether it has permission to access ~/Downloads, lists the files for your confirmation, and then performs safe deletions. If a malicious skill tries to execute sudo, Mercury rejects it outright – not asks, but rejects.

2. Token Discipline as a First Principle

Core question this section answers: How does Mercury control API costs? Can it really save me money?

Other agents are notorious for context‑window bloat. They try to feed massive JSONL conversation histories back into the model, leading to minutes of silent processing and brutal API bills.

Mercury bakes token efficiency into its core:

Only ~400 tokens of core persona are injected per request.
You set a daily token limit.
When usage exceeds 70% of your daily budget, Auto‑Concise mode automatically kicks in – tightening the context to keep your API bill flat without dropping the ball on active tasks.

Real‑world scenario: You set a daily budget of 100,000 tokens. You chat with Mercury for two hours in the morning, using 60,000 tokens. In the afternoon you ask it to handle a complex task. When token usage hits 70,000 (the 70% threshold), Mercury switches to concise mode automatically – replies become shorter and more direct, but the work still gets done. At the end of the month, your bill stays exactly where you wanted it.

Reflection: A token budget is like a fitness budget – you never know how much you’ve spent until you start tracking it. Most AI products deliberately obscure pricing models so you overspend without noticing. Mercury makes this transparent and automated. That’s basic respect for the user’s wallet.

3. A Layered, Version‑Controlled “Soul”

Core question this section answers: How can I give an AI agent a personality that’s truly mine and manageable?

Other agents either scatter personality across disjointed skill files or go the opposite direction – relying entirely on auto‑generated, opaque SQLite memory that you cannot read or edit.

Mercury offers a highly opinionated, four‑file Markdown system:

soul.md
persona.md
taste.md
heartbeat.md

You define exactly how the agent thinks, responds, and writes code – all in plain text files. You can even enforce your preference for dark themes and clean UI components right inside taste.md.

What this means:

You own it.
You write it in plain text.
You version‑control it in Git.

This is a clean identity system, not an unpredictable black box.

Real‑world scenario: You’re a developer who prefers functional programming, hates classes, and insists on camelCase variable names. You write into taste.md: “Prefer pure functions, avoid class syntax. Use camelCase for variable names. Write code comments in Chinese.” From then on, every piece of code Mercury generates follows your aesthetic. You commit the file to Git – your team can review it, propose changes, and treat the agent’s “personality” like any other code asset.

🧠 Second Brain: Mercury Remembers, You Don’t Have To

Core question this section answers: How does Mercury remember my preferences and history? Is my privacy protected?

Most AI agents forget everything when you close the chat. Mercury remembers – automatically, privately, and with surgical precision.

How it works

Extract automatically – After each conversation, Mercury runs a dedicated extraction pass that pulls 0–3 facts about you: your preferences, goals, projects, habits, relationships, and decisions. Each fact gets a type, confidence score (0–1), importance, and durability rating. Below 0.55 confidence? Rejected.
Store & merge – Facts land in SQLite with FTS5 full‑text search. If a similar fact already exists, Mercury merges them – incrementing evidence count, updating confidence, and resolving contradictions. No duplicates.
Recall what matters – Before your next message, an FTS5 query retrieves the top 5 most relevant memories within a 900‑character budget. Only what matters enters context. Your token spend stays low.
Consolidate & prune – Every 60 minutes: profile synthesis, reflection generation, and promotion of active memories to durable (3+ evidence). Auto‑pruning dismisses stale active memories after 21 days and decays low‑confidence durable memories after 120 days.

10 memory types: identity, preference, goal, project, habit, decision, constraint, relationship, episode, and reflection – each scored by confidence, importance, and durability.

You stay in control: The /memory command gives you overview, search, pause, and clear. Second Brain can be disabled entirely via config. All data stays on your machine in ~/.mercury/ – nothing leaves your machine.

Real‑world scenario: On Monday you tell Mercury, “I prefer to do code reviews in the afternoon; I’m groggy in the morning.” On Wednesday you mention, “My sleep quality hasn’t been great lately.” Mercury extracts “preference: code reviews in PM” (confidence 0.9) and “status: poor sleep” (confidence 0.7). On Friday you ask it to plan your tasks for the next day. It automatically schedules code reviews for the afternoon and only puts light reading in the morning. It doesn’t even ask – it remembered.

Reflection: True intelligence isn’t about how fast you compute – it’s about what you choose to recall and when. Many “smart” assistants are just good‑looking search engines. The Second Brain concept in Mercury reminds me that the value of knowledge isn’t storage – it’s awakening at the right moment. A 900‑character recall budget isn’t a limitation; it’s an act of respect for attention.

31 Built‑In Tools + Skill System + Scheduler

Core question this section answers: What can Mercury actually do for me? Give me concrete examples.

Mercury is not just a chatbot. It’s an operating‑system‑level orchestrator with 31 built‑in tools.

📂 Filesystem

read_file, write_file, create_file, edit_file, list_dir, delete_file, send_file, approve_scope

💬 Messaging

send_message

🐚 Shell

run_command, cd, approve_command

📦 Git

git_status, git_diff, git_log, git_add, git_commit, git_push

🌐 Web

fetch_url

🐙 GitHub

create_pr, review_pr, list_issues, create_issue, github_api – with Co-authored-by on every commit

🧩 Skills

install_skill, list_skills, use_skill

⏰ Scheduler

schedule_task, list_scheduled_tasks, cancel_scheduled_task

📊 System

budget_status

🧠 Memory

Short‑term, long‑term (auto‑extracted with dedup), episodic (timestamped interaction log)

Real‑world session (from an actual Mercury run):

You: read the package.json and tell me the deps
[Using: read_file]

You: edit the version to 0.2.0
[Using: edit_file]
Successfully replaced "0.1.0" with "0.2.0" in package.json

You: commit that change
[Using: git_add, git_commit]
[git add package.json] [git commit -m "bump version to 0.2.0"]

You: send me the package.json file
[Using: send_file]
File sent: package.json (1.2KB)

Mercury: Done! I've read the package.json, updated the version, committed the change, and sent you the file. Anything else?

That’s just one example. You can ask it to fetch the latest GitHub repos every day at 9am, remotely manage a server via Telegram, or automatically create PRs and invite your team to review them.

Up and Running in 60 Seconds

Core question this section answers: How do I get started with Mercury? Is installation complicated?

Step 1 – Install

npm i -g @cosmicstack/mercury-agent

Or use npx – no install needed:

npx @cosmicstack/mercury-agent

Step 2 – Setup

mercury

The first run launches an onboarding wizard:

Choose one or more providers (DeepSeek, OpenAI, Anthropic, Grok, Ollama Cloud, Ollama Local, and more coming)
Validate each key by fetching models
Pick your default model
Optionally pair Telegram with a bot token plus pairing code

Step 3 – Run

mercury start

Mercury wakes up, loads your soul files, restores scheduled tasks, and runs as a background daemon. Start talking via Telegram, or use mercury start --foreground for attached (foreground) mode.

Platform support:

macOS – LaunchAgent (no sudo)
Linux – systemd user unit (no sudo)
Windows – Task Scheduler (no admin)
Crash recovery – Exponential backoff watchdog
Zero deps – No PM2, forever, or NSSM needed

GitHub Companion: Your Agent, Your Commits, Your Repos

Core question this section answers: How does Mercury help me with GitHub work?

Co‑authored commits – Every Mercury‑assisted commit gets a Co-authored-by: Mercury trailer. Mercury’s avatar appears alongside yours in GitHub’s commit history.
Pull requests – Create PRs, review diffs, and post review comments – all through Mercury’s tools. Just say “create a PR” or “review the open PRs”.
Issue management – List, filter, and create issues. Schedule daily issue checks. Mercury keeps your backlog moving without you touching the browser.
Self‑hosted auth – Uses a fine‑grained Personal Access Token stored on your machine. No accounts, no OAuth servers, no cloud dependencies. Your token, your repos, your control.

Configuration example:

$ mercury doctor
GitHub username: yourusername
GitHub email: mercury@yourdomain.com
GitHub PAT (repo scope): ••••••••
✓ GitHub tools registered (5)

Honest Comparison

Feature	Mercury	OpenClaw (comparable)	Hermes (comparable)
Soul / Persona System	4 markdown files	Custom instructions	CLAUDE.md
Token Budget Enforcement	Daily budget + override	—	—
Multi‑Channel (CLI + Telegram)	Both + more coming	Yes	Yes
Skill System (Agent Skills spec)	Install, invoke, schedule	—	—
Cron + Delayed Scheduling	Persisted, auto‑restore	—	—
Permission Hardening	Blocklist + scope + approval	Confirmation prompts	Permission prompts
GitHub Companion	PRs, issues, co‑authored commits	—	—
Proactive Notifications	Heartbeat + task alerts	—	—
Auto Fact Extraction	With dedup	—	—
Provider Fallback	Auto + last‑successful tracking	Manual config	Anthropic only
File Upload (Telegram)	Yes – auto type detection	—	—
Streaming Output (CLI)	Real‑time text stream	Real‑time text stream	Real‑time text stream
Headless / 24×7 Mode	Built‑in	—	—
Language / Runtime	TypeScript / Node.js	Python	TypeScript / Node.js
Open Source License	MIT	LGPL‑2.1	Source‑available

Under the Hood – Clean, Minimal Runtime

Core: TypeScript + Node.js 18+, ESM, tsup build, SQLite‑backed Second Brain
AI SDK: Vercel AI SDK v4, streamText + generateText, 10‑step agentic loop
Providers: DeepSeek (default, cost‑effective), OpenAI (GPT‑4o‑mini, GPT‑4o, o3), Anthropic (Claude Sonnet, Haiku, Opus), Grok (xAI), Ollama Cloud, Ollama Local (zero cost, fully private). More coming: Google Gemini, Mistral, and custom OpenAI‑compatible endpoints.
Telegram: grammY, long polling, pairing codes, CLI‑managed access requests, broadcasts, file uploads
Runtime Data: ~/.mercury/ – config, soul, memory, permissions, skills, schedules – all in your home directory

Reflection: Supporting Ollama Local is a smart move. Not everyone can or wants to pay for every API call. Running locally means you can use a consumer GPU (or even CPU) to run a smaller model and let Mercury handle daily tasks at zero cost. That opens up a huge range of use cases – from privacy‑sensitive enterprise environments to hobbyists who just want to tinker. It’s also a quiet challenge to the assumption that “AI must be expensive”.

Conclusion: The Three Problems We’ve Been Ignoring

When other AI agents proved that developers want local orchestration and persistent memory, Mercury represents the next logical iteration: a streamlined, command‑line‑native engine built on a permission‑hardened foundation.

With built‑in tools covering file operations, deep GitHub management, and multi‑channel integration from CLI streaming to Telegram, the framework gets out of the way so the tools can do the work. This is an orchestrator built for actual daily use, not just a proof of concept.

We don’t need another over‑engineered application pretending to be a brain. We need a reliable, background‑native worker that respects the token budget and won’t blindly execute a destructive shell command.

Mercury is built for that reality.

Practical Summary / Action Checklist

If you’re considering adopting Mercury, here’s what you need to know:

[ ] Install – one command: npm i -g @cosmicstack/mercury-agent && mercury
[ ] Configure – the onboarding wizard helps you set up API keys and providers
[ ] Launch – mercury start runs it as a background daemon
[ ] Define your soul – edit ~/.mercury/soul.md, persona.md, taste.md, heartbeat.md
[ ] Set a budget – configure daily token limits to avoid surprise bills
[ ] Connect Telegram (optional) – get remote control from your phone
[ ] Install skills – install skill from <URL>
[ ] Schedule tasks – schedule_task <skillname> every day at 9am
[ ] Manage memory – use /memory to view, search, and control the Second Brain
[ ] GitHub integration – run mercury doctor to set up your fine‑grained PAT

One‑Page Summary

Problem	Mercury’s Solution
Security risks	Folder‑level read/write scoping, command blocklist (sudo/rm -rf never execute), approval flow
Cost runaway	Daily token budget, auto‑concise mode at 70%, only ~400 tokens per request
Identity chaos	Four Markdown soul files, Git‑versionable, plain text editable
Forgetting everything	Auto fact extraction (10 types), SQLite + FTS5, 60‑min consolidate/prune
Complex deployment	Zero‑dependency daemon, cross‑platform, auto‑start on boot / auto‑restart on crash
Single channel	CLI + Telegram (Signal, Discord, Slack, WhatsApp coming)
GitHub workflow silo	Built‑in PRs, issues, co‑authored commits

Frequently Asked Questions (FAQ)

Q: Does Mercury upload my private conversation data to the cloud?
No. All memory data stays on your local machine in ~/.mercury/ using SQLite. The Second Brain can be completely disabled. The config file gives you full control.

Q: Can I run Mercury with free / local models?
Yes. Mercury supports Ollama Local – completely local, zero API cost. It also supports Ollama Cloud, DeepSeek (the default, cost‑effective), OpenAI, Anthropic, Grok, and more.

Q: What’s the biggest difference between Mercury and OpenClaw?
Permission hardening, token budget enforcement, the four‑Markdown‑file soul system, persistent scheduling, and built‑in GitHub companion. Mercury is secure‑by‑default and always‑on.

Q: Can I share Mercury’s configuration with my team?
Yes. The soul files are plain text Markdown. Put them in a Git repository. Your team can review them, propose changes, and manage the agent’s “personality” like any other code.

Q: Does Mercury work on Windows?
Yes. It uses Windows Task Scheduler to install as a system service, no admin rights required.

Q: What if my API provider goes down?
Mercury has built‑in provider fallback. It tracks the last successful provider and automatically switches to an available alternative. No manual intervention needed.

Q: Is the skill system safe?
Skills only receive explicitly defined granular tool permissions. Destructive commands are hard‑blocked at the execution layer. Mercury prompts you to confirm before installing any skill.

Q: Is Mercury open source?
Yes, under the MIT license. The code repository and documentation are available on GitHub.

Q: Can I use Mercury without Telegram?
Absolutely. The CLI (terminal) mode works perfectly. Telegram is an optional channel for remote access.

Q: How does Mercury handle conflicting memories?
When a new fact contradicts an existing one (e.g., you liked X, now you prefer Y), the memory with higher confidence wins. Stale or low‑confidence memories are pruned automatically.

Mercury AI Agent: The Safest AI Assistant That Controls Costs & Permissions