OpenMemory: Give Any AI a Private, Persistent & Explainable Long-Term Memory
“
In one line—OpenMemory is a self-hosted, MIT-licensed “memory engine” that turns LLMs from goldfish into elephants: they never forget user facts, yet can tell you exactly why they recalled something.
Core questions this post answers
-
Why do vector DBs and chat-history caches fail at “getting smarter over time”? -
How does OpenMemory’s Hierarchical Memory Decomposition (HMD) work in plain English? -
Can you go from git clone to first recall in under 10 minutes? -
What does production look like for a personal assistant, an enterprise copilot and a LangGraph agent? -
Where do the 10 × cost savings come from without sacrificing latency or accuracy?
1. The goldfish problem: where today’s stacks drop the ball
| Pain | Symptom | Root cause in existing tools |
|---|---|---|
| Session amnesia | New chat → preferences gone | Context windows are short-lived |
| Vector glut | Same sentence stored 20 ×, key fact still missing | Flat embeddings, no structure |
| Black-box retrieval | “Why this chunk?”—no idea | No weights, no path, no explainability |
| Runaway cost | 2–3 USD per 1 M tokens | Hosted embedding + SaaS margin |
Personal anecdote: we once fed 30 days of support logs into Pinecone. When a user updated her shipping address, the bot returned three obsolete ones—cosine similarity ≠ business truth. That day I learned that “structure-free” is a feature until it isn’t.
2. OpenMemory’s brain map in one glance
Short answer: every memory is split into five “cognitive drawers”, linked by a sparse, biologically inspired graph. At query time four factors—similarity, salience, recency and link weight—are fused into a single score, so the engine is both fast and auditable.
2.1 The five drawers
| Drawer | Example | Embedding model |
|---|---|---|
| episodic | “User said he prefers dark roast last Wednesday” | E5-large |
| semantic | “Dark roast = low-acid coffee” | BGE-base |
| procedural | “Grind 18 g, 90 °C, 25 s pre-infuse” | OpenAI-3-small |
| emotional | “Customer angry about 40 min wait” | Gemini-text |
| reflective | “User likely values speed over small talk” | Ollama-nomic |
2.2 Single-waypoint graph
-
One canonical node per memory—zero duplication. -
Directed edge = “activates next”; traversal stops at 1-hop → constant time.
2.3 Four-factor ranking
Score = 0.6·cos_sim + 0.2·salience + 0.1·recency + 0.1·link_weight
Because the coefficients are baked into the response meta, you can explain any recall path to compliance teams—or curious users.
3. Zero-to-recall in 6 commands
Short answer: install Node 20 → clone → tweak five env vars → npm run dev → curl to write → curl to search. Done.
3.1 Manual setup (dev favourite)
# 1. Clone
git clone https://github.com/caviraoss/openmemory.git
cd openmemory/backend
cp .env.example .env
# 2. Install
npm install
# 3. Switch to local embeddings (example: Ollama)
echo 'OM_EMBEDDINGS=ollama' >> .env
echo 'OLLAMA_URL=http://localhost:11434' >> .env
# 4. Start
npm run dev
# API now listens on http://localhost:8080
3.2 Write & retrieve
# Write a memory
curl -X POST http://localhost:8080/memory/add \
-H "Content-Type: application/json" \
-d '{"content":"User prefers dark mode"}'
# Query
curl -X POST http://localhost:8080/memory/query \
-H "Content-Type: application/json" \
-d '{"query":"UI preference"}'
Response
[
{
"id":"a7f83b",
"content":"User prefers dark mode",
"score":0.87,
"path":"episodic→semantic"
}
]
Lesson learned: I once set OM_MIN_SCORE=0.9 and got zero hits—cosine between “dark mode” and “UI preference” was 0.81. Dialing it back to 0.3 doubled recall overnight. Thresholds are knives—handle with care.
4. Three real-world patterns
Short answer: personal assistant remembers taste, enterprise copilot remembers SOP, LangGraph nodes auto-archive their own outputs—code snippets included.
4.1 Personal assistant—“never ask about cilantro again”
-
Write: detect negations (“hate / skip / no”) → save as episodic, salience 0.9. -
Retrieve: before meal suggestions query="cilantro dislike"→ filter menus. -
Code (Node.js)
await fetch(`${OM_URL}/memory/add`,{
method:'POST',
body: JSON.stringify({
content: 'User hates cilantro',
sector: 'episodic',
salience: 0.9
})
});
4.2 Enterprise copilot—“onboard in 30 min”
-
Chunk 30-page expense PDF into procedural memories. -
User types “file travel refund” → copilot queries procedural sector → returns latest steps + template links. -
Result: average onboarding drops from 3 days to 30 minutes.
4.3 LangGraph mode—agents that reflect on yesterday’s plan
Enable with env
OM_MODE=langgraph
OM_LG_NAMESPACE=finance_agent
OM_LG_MAX_CONTEXT=50
OM_LG_REFLECTIVE=true
Automatic mapping
| LangGraph node | Memory sector |
|---|---|
| observe | episodic |
| plan | semantic |
| reflect | reflective |
| act | procedural |
| emotion | emotional |
After a long-horizon task finishes, the memory layer already holds distilled lessons—swap prompts tomorrow, still profit.
5. Performance & cost: where the 10 × saving comes from
Short answer: local embeddings remove API tolls, zero-duplication keeps disks small, sparse graph traversal stays CPU-cheap, SQLite + blob store squeezes 1 M memories into ~15 GB.
| Metric | OpenMemory self-host | Zep Cloud | Supermemory | Mem0 |
|---|---|---|---|---|
| Query latency @100 k | 110–130 ms | 280–350 ms | 350–400 ms | 250 ms |
| Hosted embed cost /1 M tokens | $0.30–0.40 | $2.0–2.5 | $2.50+ | $1.20 |
| Local models | ✅ Ollama/E5/BGE | ❌ | ❌ | partial |
| Monthly cost @100 k | $5–8 VPS | $80–150 | $60–120 | $25–40 |
| Explainable path | ✅ | ❌ | ❌ | ❌ |
After migrating 100 k memories off Zep, our monthly invoice fell from 6.5 while latency improved—proof that architecture, not bargaining, is the biggest lever on cost.
6. Security & privacy—data never leaves your disk
-
Bearer token mandatory for write endpoints. -
Optional AES-GCM field-level encryption. -
Tenant isolation + physical DELETE /memory/:id. -
Zero third-party clouds → GDPR & HIPAA paperwork shrinks.
7. Roadmap: shipping today, evolving tomorrow
| Release | Highlight | Status |
|---|---|---|
| v1.2 | React dashboard + metrics | in progress |
| v1.3 | Tiny transformer auto-sector | planned |
| v1.4 | Federated multi-node | planned |
| v1.5 | Plug-able pgvector / Weaviate | planned |
8. TL;DR checklist
-
Grab any machine with Node 20 and 2 GB RAM. -
git clone → cp .env → npm install → npm run dev. -
Write one “user preference” memory, query to confirm path. -
Production: docker compose up -d, mount/datavolume. -
Turn on Bearer auth + schedule cron for decay pruning.
9. One-page summary
OpenMemory adds structured, explainable long-term memory to any LLM. Its five-drawer cognitive model plus single-waypoint graph delivers 110 ms recall at 1/10 the cost of cloud memory services. A built-in MCP server, LangGraph hooks and Docker one-liner make it production-ready for personal assistants, enterprise copilots and multi-agent systems—today.
10. FAQ
Q1: Do I have to use Ollama?
No—swap OM_EMBEDDINGS for openai, gemini, E5 or BGE.
Q2: Will SQLite choke at scale?
Benchmark shows <130 ms at 100 k memories; pgvector backend lands in v1.5 for tens of millions.
Q3: How do I prevent memory bloat?
Built-in decay scheduler prunes low-salience nodes automatically—no manual vacuuming.
Q4: Can I scale horizontally?
Today you can shard by sector manually; federated auto-scaling ships with v1.4.
Q5: What separates OpenMemory from Mem0?
Mem0 stores flat JSON; OpenMemory keeps a multi-sector graph with explainable recall and lower operational cost.
Q6: Is a GPU required?
Inference is CPU-only. Running a local 7 B embedding model benefits from 4 GB VRAM but is optional.
Q7: Does it ingest audio or images?
v1.1 accepts pdf, docx, txt and audio—transcribed before entering the memory pipeline.
Q8: License?
MIT—commercial use, closed-source forks and redistribution are all allowed.
