OpenMemory: Give Any AI a Private, Persistent & Explainable Long-Term Memory
“
In one line—OpenMemory is a self-hosted, MIT-licensed “memory engine” that turns LLMs from goldfish into elephants: they never forget user facts, yet can tell you exactly why they recalled something.
Core questions this post answers
-
Why do vector DBs and chat-history caches fail at “getting smarter over time”? -
How does OpenMemory’s Hierarchical Memory Decomposition (HMD) work in plain English? -
Can you go from git clone to first recall in under 10 minutes? -
What does production look like for a personal assistant, an enterprise copilot and a LangGraph agent? -
Where do the 10 × cost savings come from without sacrificing latency or accuracy?
1. The goldfish problem: where today’s stacks drop the ball
Personal anecdote: we once fed 30 days of support logs into Pinecone. When a user updated her shipping address, the bot returned three obsolete ones—cosine similarity ≠ business truth. That day I learned that “structure-free” is a feature until it isn’t.
2. OpenMemory’s brain map in one glance
Short answer: every memory is split into five “cognitive drawers”, linked by a sparse, biologically inspired graph. At query time four factors—similarity, salience, recency and link weight—are fused into a single score, so the engine is both fast and auditable.
2.1 The five drawers
2.2 Single-waypoint graph
-
One canonical node per memory—zero duplication. -
Directed edge = “activates next”; traversal stops at 1-hop → constant time.
2.3 Four-factor ranking
Score = 0.6·cos_sim + 0.2·salience + 0.1·recency + 0.1·link_weight
Because the coefficients are baked into the response meta, you can explain any recall path to compliance teams—or curious users.
3. Zero-to-recall in 6 commands
Short answer: install Node 20 → clone → tweak five env vars → npm run dev → curl to write → curl to search. Done.
3.1 Manual setup (dev favourite)
# 1. Clone
git clone https://github.com/caviraoss/openmemory.git
cd openmemory/backend
cp .env.example .env
# 2. Install
npm install
# 3. Switch to local embeddings (example: Ollama)
echo 'OM_EMBEDDINGS=ollama' >> .env
echo 'OLLAMA_URL=http://localhost:11434' >> .env
# 4. Start
npm run dev
# API now listens on http://localhost:8080
3.2 Write & retrieve
# Write a memory
curl -X POST http://localhost:8080/memory/add \
-H "Content-Type: application/json" \
-d '{"content":"User prefers dark mode"}'
# Query
curl -X POST http://localhost:8080/memory/query \
-H "Content-Type: application/json" \
-d '{"query":"UI preference"}'
Response
[
{
"id":"a7f83b",
"content":"User prefers dark mode",
"score":0.87,
"path":"episodic→semantic"
}
]
Lesson learned: I once set OM_MIN_SCORE=0.9 and got zero hits—cosine between “dark mode” and “UI preference” was 0.81. Dialing it back to 0.3 doubled recall overnight. Thresholds are knives—handle with care.
4. Three real-world patterns
Short answer: personal assistant remembers taste, enterprise copilot remembers SOP, LangGraph nodes auto-archive their own outputs—code snippets included.
4.1 Personal assistant—“never ask about cilantro again”
-
Write: detect negations (“hate / skip / no”) → save as episodic, salience 0.9. -
Retrieve: before meal suggestions query="cilantro dislike"→ filter menus. -
Code (Node.js)
await fetch(`${OM_URL}/memory/add`,{
method:'POST',
body: JSON.stringify({
content: 'User hates cilantro',
sector: 'episodic',
salience: 0.9
})
});
4.2 Enterprise copilot—“onboard in 30 min”
-
Chunk 30-page expense PDF into procedural memories. -
User types “file travel refund” → copilot queries procedural sector → returns latest steps + template links. -
Result: average onboarding drops from 3 days to 30 minutes.
4.3 LangGraph mode—agents that reflect on yesterday’s plan
Enable with env
OM_MODE=langgraph
OM_LG_NAMESPACE=finance_agent
OM_LG_MAX_CONTEXT=50
OM_LG_REFLECTIVE=true
Automatic mapping
After a long-horizon task finishes, the memory layer already holds distilled lessons—swap prompts tomorrow, still profit.
5. Performance & cost: where the 10 × saving comes from
Short answer: local embeddings remove API tolls, zero-duplication keeps disks small, sparse graph traversal stays CPU-cheap, SQLite + blob store squeezes 1 M memories into ~15 GB.
After migrating 100 k memories off Zep, our monthly invoice fell from 6.5 while latency improved—proof that architecture, not bargaining, is the biggest lever on cost.
6. Security & privacy—data never leaves your disk
-
Bearer token mandatory for write endpoints. -
Optional AES-GCM field-level encryption. -
Tenant isolation + physical DELETE /memory/:id. -
Zero third-party clouds → GDPR & HIPAA paperwork shrinks.
7. Roadmap: shipping today, evolving tomorrow
8. TL;DR checklist
-
Grab any machine with Node 20 and 2 GB RAM. -
git clone → cp .env → npm install → npm run dev. -
Write one “user preference” memory, query to confirm path. -
Production: docker compose up -d, mount/datavolume. -
Turn on Bearer auth + schedule cron for decay pruning.
9. One-page summary
OpenMemory adds structured, explainable long-term memory to any LLM. Its five-drawer cognitive model plus single-waypoint graph delivers 110 ms recall at 1/10 the cost of cloud memory services. A built-in MCP server, LangGraph hooks and Docker one-liner make it production-ready for personal assistants, enterprise copilots and multi-agent systems—today.
10. FAQ
Q1: Do I have to use Ollama?
No—swap OM_EMBEDDINGS for openai, gemini, E5 or BGE.
Q2: Will SQLite choke at scale?
Benchmark shows <130 ms at 100 k memories; pgvector backend lands in v1.5 for tens of millions.
Q3: How do I prevent memory bloat?
Built-in decay scheduler prunes low-salience nodes automatically—no manual vacuuming.
Q4: Can I scale horizontally?
Today you can shard by sector manually; federated auto-scaling ships with v1.4.
Q5: What separates OpenMemory from Mem0?
Mem0 stores flat JSON; OpenMemory keeps a multi-sector graph with explainable recall and lower operational cost.
Q6: Is a GPU required?
Inference is CPU-only. Running a local 7 B embedding model benefits from 4 GB VRAM but is optional.
Q7: Does it ingest audio or images?
v1.1 accepts pdf, docx, txt and audio—transcribed before entering the memory pipeline.
Q8: License?
MIT—commercial use, closed-source forks and redistribution are all allowed.

