Nemori: Teaching AI to Remember Like a Human – A Practical Guide to Episodic Memory for LLMs
“I swear we talked about Kyoto last week … what did Alice say about the cherry blossoms?”
If your chatbot can’t answer that, keep reading.
Table of Contents
-
👉The 30-Second Pitch -
👉Why Traditional Memory Fails -
👉How Nemori Works (No PhD Required) -
👉Quick-Start: Run the LoCoMo Benchmark in 30 Minutes -
👉Architecture at a Glance -
👉Deep Dive: From Raw Chat to Searchable Episode -
👉Performance on LoCoMo -
👉Integration Cookbook -
👉FAQ: Engineers Ask These First -
👉Roadmap
1. The 30-Second Pitch {#the-30-second-pitch}
Nemori is a small, open-source library that turns any stream of user data—chats, GPS pings, voice notes—into human-scale episodes your LLM can recall later.
Think of it as a memory palace written in Python.
-
Two prompts detect boundaries and write the episode. -
BM25 indexes the episodes—no extra LLM calls. -
User-level isolation keeps Alice’s memories away from Bob’s.
Drop it into your bot today; tomorrow your users will ask, “How do you remember that?”
2. Why Traditional Memory Fails {#why-traditional-memory-fails}
Human Memory | Typical AI Memory | Nemori’s Fix |
---|---|---|
“Last Wednesday Alice and I decided on Kyoto in April.” | Message #312: user: kyoto sounds nice |
Episode title: Alice and I Plan a Kyoto Cherry-Blossom Trip Summary: 2024-01-15 10:30 — Alice proposed April dates; flight JAL123 booked |
Keywords: Alice, Kyoto, April | Keywords: message ID, timestamp, user ID | Keywords: Kyoto, trip, Alice, cherry-blossom, flight |
Traditional systems search messages. Nemori searches events.
3. How Nemori Works (No PhD Required) {#how-nemori-works}
3.1 The Four-Layer Cake
graph TD
A[Raw Data<br>chat / location / media] -->|Layer 1<br>Data Ingestion| B[Typed Event]
B -->|Layer 2<br>Episode Builder| C[Episode Object]
C -->|Layer 3<br>Indexing| D[BM25 Index]
D -->|Layer 4<br>Retrieval| E[Answer]
-
Layer 1 normalizes any input into RawEventData
. -
Layer 2 chooses the right builder (conversation, location, etc.) and produces an Episode
. -
Layer 3 tokenizes, stems, and stores the episode in a per-user BM25 index. -
Layer 4 answers questions by retrieving the top-k episodes and handing them to the LLM.
3.2 Two Prompts, Zero Embeddings
Step | Prompt | Output |
---|---|---|
Boundary Detection | Detect episode boundaries along natural topic shifts. |
List of start/end message IDs |
Episode Generation | Summarize each segment into an episodic memory. |
Title, summary, keywords |
That’s it. No vector databases, no re-rankers—unless you want them later.
4. Quick-Start: Run the LoCoMo Benchmark in 30 Minutes {#quick-start}
4.1 Prerequisites
-
Python 3.12+ -
uv
package manager -
One API key: OpenAI, Anthropic, or Google
git clone https://github.com/nemori-ai/nemori.git
cd nemori
uv sync
export OPENAI_API_KEY="sk-xxx"
4.2 One Script, End-to-End
from nemori.core.data_types import RawEventData, DataType, TemporalInfo
from nemori.llm.providers import OpenAIProvider
from nemori.builders.conversation_builder import ConversationEpisodeBuilder
from nemori.core.retrieval import RetrievalService, RetrievalStrategy
from datetime import datetime
# 1. Feed a conversation
messages = [
{"user": "alice", "text": "I’m thinking Kyoto in April for the sakura."},
{"user": "bot", "text": "Great idea! Shall we book flights?"}
]
raw = RawEventData(
data_type=DataType.CONVERSATION,
content=messages,
source="telegram",
temporal_info=TemporalInfo(timestamp=datetime.utcnow())
)
# 2. Build episode
builder = ConversationEpisodeBuilder(llm_provider=OpenAIProvider.from_env())
episode = builder.build_episode(raw, for_owner="alice")
# 3. Index
service = RetrievalService(episode_repo)
service.register_provider(RetrievalStrategy.BM25, config={})
service.add_episode_to_all_providers(episode)
# 4. Query
query = RetrievalQuery(text="when sakura kyoto", owner_id="alice", limit=3)
results = service.search(query)
print(results.episodes[0].summary)
# → Alice proposed April dates for Kyoto cherry-blossom trip.
4.3 Verify on LoCoMo
cd evaluation
uv run pytest -m locomo
Expect state-of-the-art F1 without extra tuning.
5. Architecture at a Glance {#architecture-at-a-glance}
Layer | Core Classes | Responsibility |
---|---|---|
Data Ingestion | RawEventData , DataType |
Accept anything—chat, GPS, sensor logs—into one schema. |
Memory Processing | EpisodeBuilder (abstract) + ConversationEpisodeBuilder |
Turn raw data into human-like episodes. |
Episodic Memory | Episode , EpisodeLevel (1-4) |
Store episodes with titles, summaries, keywords, timestamps, importance scores. |
Storage & Retrieval | RetrievalService + BM25RetrievalProvider |
Per-user BM25 index, automatic updates on create/update/delete. |
All layers speak through clean interfaces; swap DuckDB for Postgres later without touching the rest.
6. Deep Dive: From Raw Chat to Searchable Episode {#deep-dive}
6.1 Step 1 – Store Raw Data (Immutable)
await raw_repo.store_raw_data(raw) # never modified again
6.2 Step 2 – Detect Boundaries
The LLM receives the last N messages and returns [start_id, end_id]
pairs.
Internal prompt (exact):
“Detect episode boundaries along natural topic shifts. Return JSON list of [start,end] pairs.”
6.3 Step 3 – Generate Narrative
Each pair is fed to:
“Summarize this segment into an episodic memory: who, what, when, why. 1–2 sentences.”
Example output:
Title: Alice and I Plan a Kyoto Cherry-Blossom Trip
Summary: On 2024-01-15 at 10:30, Alice suggested visiting Kyoto in early April for sakura; we agreed on JAL123 flights.
6.4 Step 4 – Index with BM25
-
Tokenize with NLTK (English) -
Stem with Porter Stemmer -
Fields & weights -
title × 3 -
summary × 2 -
entities × 2 -
topics × 2 -
content × 1
-
Each user gets a separate index; no cross-talk.
7. Performance {#performance}
Benchmark | Nemori Score | Notes |
---|---|---|
LoCoMo F1 | SOTA* | Two prompts + BM25, no embeddings |
Latency | < 200 ms | Per-user index fits in RAM |
*Exact numbers in figures/locomo-scores.png
inside repo.
8. Integration Cookbook {#integration-cookbook}
8.1 Embed in an Existing Chatbot
from nemori.core.managers import EpisodeManager
manager = EpisodeManager(
raw_data_repo=MemoryRawDataRepository(),
episode_repo=MemoryEpisodicMemoryRepository(),
builder_registry=ConversationEpisodeBuilder(llm_provider=OpenAIProvider.from_env()),
retrieval_service=RetrievalService(episode_repo)
)
# On every user message
await manager.process_raw_data(new_msg, owner_id=user_id)
# Before generating a reply
episodes = await manager.search_episodes("kyoto sakura", owner_id=user_id)
context = "\n".join(ep.summary for ep in episodes.episodes[:5])
prompt = f"Given these memories:\n{context}\nAnswer: {user_question}"
8.2 Batch Process Historical Logs
conversations = load_old_logs()
for conv in conversations:
await manager.process_raw_data(conv, owner_id="alice")
8.3 Graceful Fallback if LLM Fails
If the OpenAI key is missing, Nemori falls back to a rule-based boundary detector and TF-IDF summary—accuracy drops ~5 % but keeps the bot alive.
9. FAQ: Engineers Ask These First {#faq}
Q1. Will long episodes bloat token usage?
Top-20 episodes ≈ 1.5 k tokens, similar to raw top-20 messages. Dropping to top-10 halves tokens with negligible loss.
Q2. What about details only in raw text?
We keep raw text untouched. Future “semantic memory” module will selectively fuse entities back into episodes—planned open-source.
Q3. Images or audio?
Architecture supports DataType.MEDIA
. Pipeline: vision/audio → text summary → same episode flow.
Q4. Million-user scale?
-
Swap DuckDB → Postgres or sharded KV. -
BM25 indexes are user-sharded; horizontal scale trivial. -
Episode builders can be queued.
Q5. Why BM25 instead of embeddings?
LoCoMo is conversation-heavy; BM25 + clean narratives outperforms embeddings alone. Hybrid mode is one line away.
10. Roadmap {#roadmap}
Milestone | ETA | What You’ll See |
---|---|---|
Semantic Memory Plug-in | Q3 2024 | Inject names, dates, locations lost during summarization |
Episode Clustering | Q4 2024 | Auto-merge similar episodes into long-term themes |
Multi-modal | 2025 | Images, audio, sensor streams as first-class episodes |
Federated Storage | TBD | Client-side encrypted indexes, server only holds pointers |
Closing Thought
Nemori doesn’t try to remember everything.
It remembers what matters, the way humans do—events, not messages.
Clone it, plug it in, and give your AI the gift of a past.