Nemori: Teaching AI to Remember Like a Human – A Practical Guide to Episodic Memory for LLMs

“I swear we talked about Kyoto last week … what did Alice say about the cherry blossoms?”
If your chatbot can’t answer that, keep reading.

👉The 30-Second Pitch
👉Why Traditional Memory Fails
👉How Nemori Works (No PhD Required)
👉Quick-Start: Run the LoCoMo Benchmark in 30 Minutes
👉Architecture at a Glance
👉Deep Dive: From Raw Chat to Searchable Episode
👉Performance on LoCoMo
👉Integration Cookbook
👉FAQ: Engineers Ask These First
👉Roadmap

1. The 30-Second Pitch {#the-30-second-pitch}

Nemori is a small, open-source library that turns any stream of user data—chats, GPS pings, voice notes—into human-scale episodes your LLM can recall later.
Think of it as a memory palace written in Python.

Two prompts detect boundaries and write the episode.
BM25 indexes the episodes—no extra LLM calls.
User-level isolation keeps Alice’s memories away from Bob’s.

Drop it into your bot today; tomorrow your users will ask, “How do you remember that?”

2. Why Traditional Memory Fails {#why-traditional-memory-fails}

Human Memory	Typical AI Memory	Nemori’s Fix
“Last Wednesday Alice and I decided on Kyoto in April.”	Message #312: `user: kyoto sounds nice`	Episode title: Alice and I Plan a Kyoto Cherry-Blossom Trip Summary: 2024-01-15 10:30 — Alice proposed April dates; flight JAL123 booked
Keywords: Alice, Kyoto, April	Keywords: message ID, timestamp, user ID	Keywords: Kyoto, trip, Alice, cherry-blossom, flight

Traditional systems search messages. Nemori searches events.

3. How Nemori Works (No PhD Required) {#how-nemori-works}

3.1 The Four-Layer Cake

graph TD
    A[Raw Data<br>chat / location / media] -->|Layer 1<br>Data Ingestion| B[Typed Event]
    B -->|Layer 2<br>Episode Builder| C[Episode Object]
    C -->|Layer 3<br>Indexing| D[BM25 Index]
    D -->|Layer 4<br>Retrieval| E[Answer]

Layer 1 normalizes any input into RawEventData.
Layer 2 chooses the right builder (conversation, location, etc.) and produces an Episode.
Layer 3 tokenizes, stems, and stores the episode in a per-user BM25 index.
Layer 4 answers questions by retrieving the top-k episodes and handing them to the LLM.

3.2 Two Prompts, Zero Embeddings

Step	Prompt	Output
Boundary Detection	`Detect episode boundaries along natural topic shifts.`	List of start/end message IDs
Episode Generation	`Summarize each segment into an episodic memory.`	Title, summary, keywords

That’s it. No vector databases, no re-rankers—unless you want them later.

4. Quick-Start: Run the LoCoMo Benchmark in 30 Minutes {#quick-start}

4.1 Prerequisites

Python 3.12+
uv package manager
One API key: OpenAI, Anthropic, or Google

git clone https://github.com/nemori-ai/nemori.git
cd nemori
uv sync
export OPENAI_API_KEY="sk-xxx"

4.2 One Script, End-to-End

from nemori.core.data_types import RawEventData, DataType, TemporalInfo
from nemori.llm.providers import OpenAIProvider
from nemori.builders.conversation_builder import ConversationEpisodeBuilder
from nemori.core.retrieval import RetrievalService, RetrievalStrategy
from datetime import datetime

# 1. Feed a conversation
messages = [
    {"user": "alice", "text": "I’m thinking Kyoto in April for the sakura."},
    {"user": "bot",  "text": "Great idea! Shall we book flights?"}
]
raw = RawEventData(
    data_type=DataType.CONVERSATION,
    content=messages,
    source="telegram",
    temporal_info=TemporalInfo(timestamp=datetime.utcnow())
)

# 2. Build episode
builder = ConversationEpisodeBuilder(llm_provider=OpenAIProvider.from_env())
episode = builder.build_episode(raw, for_owner="alice")

# 3. Index
service = RetrievalService(episode_repo)
service.register_provider(RetrievalStrategy.BM25, config={})
service.add_episode_to_all_providers(episode)

# 4. Query
query = RetrievalQuery(text="when sakura kyoto", owner_id="alice", limit=3)
results = service.search(query)
print(results.episodes[0].summary)
# → Alice proposed April dates for Kyoto cherry-blossom trip.

4.3 Verify on LoCoMo

cd evaluation
uv run pytest -m locomo

Expect state-of-the-art F1 without extra tuning.

5. Architecture at a Glance {#architecture-at-a-glance}

Layer	Core Classes	Responsibility
Data Ingestion	`RawEventData`, `DataType`	Accept anything—chat, GPS, sensor logs—into one schema.
Memory Processing	`EpisodeBuilder` (abstract) + `ConversationEpisodeBuilder`	Turn raw data into human-like episodes.
Episodic Memory	`Episode`, `EpisodeLevel` (1-4)	Store episodes with titles, summaries, keywords, timestamps, importance scores.
Storage & Retrieval	`RetrievalService` + `BM25RetrievalProvider`	Per-user BM25 index, automatic updates on create/update/delete.

All layers speak through clean interfaces; swap DuckDB for Postgres later without touching the rest.

6. Deep Dive: From Raw Chat to Searchable Episode {#deep-dive}

6.1 Step 1 – Store Raw Data (Immutable)

await raw_repo.store_raw_data(raw)   # never modified again

6.2 Step 2 – Detect Boundaries

The LLM receives the last N messages and returns [start_id, end_id] pairs.
Internal prompt (exact):

“Detect episode boundaries along natural topic shifts. Return JSON list of [start,end] pairs.”

6.3 Step 3 – Generate Narrative

Each pair is fed to:

“Summarize this segment into an episodic memory: who, what, when, why. 1–2 sentences.”

Example output:
Title: Alice and I Plan a Kyoto Cherry-Blossom Trip
Summary: On 2024-01-15 at 10:30, Alice suggested visiting Kyoto in early April for sakura; we agreed on JAL123 flights.

6.4 Step 4 – Index with BM25

Tokenize with NLTK (English)
Stem with Porter Stemmer
Fields & weights
- title × 3
- summary × 2
- entities × 2
- topics × 2
- content × 1

Each user gets a separate index; no cross-talk.

7. Performance {#performance}

Benchmark	Nemori Score	Notes
LoCoMo F1	SOTA*	Two prompts + BM25, no embeddings
Latency	< 200 ms	Per-user index fits in RAM

*Exact numbers in figures/locomo-scores.png inside repo.

8. Integration Cookbook {#integration-cookbook}

8.1 Embed in an Existing Chatbot

from nemori.core.managers import EpisodeManager

manager = EpisodeManager(
    raw_data_repo=MemoryRawDataRepository(),
    episode_repo=MemoryEpisodicMemoryRepository(),
    builder_registry=ConversationEpisodeBuilder(llm_provider=OpenAIProvider.from_env()),
    retrieval_service=RetrievalService(episode_repo)
)

# On every user message
await manager.process_raw_data(new_msg, owner_id=user_id)

# Before generating a reply
episodes = await manager.search_episodes("kyoto sakura", owner_id=user_id)
context = "\n".join(ep.summary for ep in episodes.episodes[:5])
prompt = f"Given these memories:\n{context}\nAnswer: {user_question}"

8.2 Batch Process Historical Logs

conversations = load_old_logs()
for conv in conversations:
    await manager.process_raw_data(conv, owner_id="alice")

8.3 Graceful Fallback if LLM Fails

If the OpenAI key is missing, Nemori falls back to a rule-based boundary detector and TF-IDF summary—accuracy drops ~5 % but keeps the bot alive.

9. FAQ: Engineers Ask These First {#faq}

Q1. Will long episodes bloat token usage?
Top-20 episodes ≈ 1.5 k tokens, similar to raw top-20 messages. Dropping to top-10 halves tokens with negligible loss.

Q2. What about details only in raw text?
We keep raw text untouched. Future “semantic memory” module will selectively fuse entities back into episodes—planned open-source.

Q3. Images or audio?
Architecture supports DataType.MEDIA. Pipeline: vision/audio → text summary → same episode flow.

Q4. Million-user scale?

Swap DuckDB → Postgres or sharded KV.
BM25 indexes are user-sharded; horizontal scale trivial.
Episode builders can be queued.

Q5. Why BM25 instead of embeddings?
LoCoMo is conversation-heavy; BM25 + clean narratives outperforms embeddings alone. Hybrid mode is one line away.

10. Roadmap {#roadmap}

Milestone	ETA	What You’ll See
Semantic Memory Plug-in	Q3 2024	Inject names, dates, locations lost during summarization
Episode Clustering	Q4 2024	Auto-merge similar episodes into long-term themes
Multi-modal	2025	Images, audio, sensor streams as first-class episodes
Federated Storage	TBD	Client-side encrypted indexes, server only holds pointers

Closing Thought

Nemori doesn’t try to remember everything.
It remembers what matters, the way humans do—events, not messages.

Clone it, plug it in, and give your AI the gift of a past.

Revolutionizing AI Memory: How Nemori’s Episodic System Transforms LLM Recall Accuracy

Nemori: Teaching AI to Remember Like a Human – A Practical Guide to Episodic Memory for LLMs

Table of Contents

1. The 30-Second Pitch {#the-30-second-pitch}

2. Why Traditional Memory Fails {#why-traditional-memory-fails}

3. How Nemori Works (No PhD Required) {#how-nemori-works}

3.1 The Four-Layer Cake

3.2 Two Prompts, Zero Embeddings

4. Quick-Start: Run the LoCoMo Benchmark in 30 Minutes {#quick-start}

4.1 Prerequisites

4.2 One Script, End-to-End

4.3 Verify on LoCoMo

5. Architecture at a Glance {#architecture-at-a-glance}

6. Deep Dive: From Raw Chat to Searchable Episode {#deep-dive}

6.1 Step 1 – Store Raw Data (Immutable)

6.2 Step 2 – Detect Boundaries

6.3 Step 3 – Generate Narrative

6.4 Step 4 – Index with BM25

7. Performance {#performance}

8. Integration Cookbook {#integration-cookbook}

8.1 Embed in an Existing Chatbot

8.2 Batch Process Historical Logs

8.3 Graceful Fallback if LLM Fails

9. FAQ: Engineers Ask These First {#faq}

10. Roadmap {#roadmap}

Closing Thought

Revolutionizing AI Memory: How Nemori’s Episodic System Transforms LLM Recall Accuracy

Nemori: Teaching AI to Remember Like a Human – A Practical Guide to Episodic Memory for LLMs

Table of Contents

1. The 30-Second Pitch {#the-30-second-pitch}

2. Why Traditional Memory Fails {#why-traditional-memory-fails}

3. How Nemori Works (No PhD Required) {#how-nemori-works}

3.1 The Four-Layer Cake

3.2 Two Prompts, Zero Embeddings

4. Quick-Start: Run the LoCoMo Benchmark in 30 Minutes {#quick-start}

4.1 Prerequisites

4.2 One Script, End-to-End

4.3 Verify on LoCoMo

5. Architecture at a Glance {#architecture-at-a-glance}

6. Deep Dive: From Raw Chat to Searchable Episode {#deep-dive}

6.1 Step 1 – Store Raw Data (Immutable)

6.2 Step 2 – Detect Boundaries

6.3 Step 3 – Generate Narrative

6.4 Step 4 – Index with BM25

7. Performance {#performance}

8. Integration Cookbook {#integration-cookbook}

8.1 Embed in an Existing Chatbot

8.2 Batch Process Historical Logs

8.3 Graceful Fallback if LLM Fails

9. FAQ: Engineers Ask These First {#faq}

10. Roadmap {#roadmap}

Closing Thought

Related Posts