Nemori: Teaching AI to Remember Like a Human – A Practical Guide to Episodic Memory for LLMs

“I swear we talked about Kyoto last week … what did Alice say about the cherry blossoms?”
If your chatbot can’t answer that, keep reading.


Table of Contents

  1. 👉The 30-Second Pitch
  2. 👉Why Traditional Memory Fails
  3. 👉How Nemori Works (No PhD Required)
  4. 👉Quick-Start: Run the LoCoMo Benchmark in 30 Minutes
  5. 👉Architecture at a Glance
  6. 👉Deep Dive: From Raw Chat to Searchable Episode
  7. 👉Performance on LoCoMo
  8. 👉Integration Cookbook
  9. 👉FAQ: Engineers Ask These First
  10. 👉Roadmap

1. The 30-Second Pitch {#the-30-second-pitch}

Nemori is a small, open-source library that turns any stream of user data—chats, GPS pings, voice notes—into human-scale episodes your LLM can recall later.
Think of it as a memory palace written in Python.

  • Two prompts detect boundaries and write the episode.
  • BM25 indexes the episodes—no extra LLM calls.
  • User-level isolation keeps Alice’s memories away from Bob’s.

Drop it into your bot today; tomorrow your users will ask, “How do you remember that?”


2. Why Traditional Memory Fails {#why-traditional-memory-fails}

Human Memory Typical AI Memory Nemori’s Fix
“Last Wednesday Alice and I decided on Kyoto in April.” Message #312: user: kyoto sounds nice Episode title: Alice and I Plan a Kyoto Cherry-Blossom Trip
Summary: 2024-01-15 10:30 — Alice proposed April dates; flight JAL123 booked
Keywords: Alice, Kyoto, April Keywords: message ID, timestamp, user ID Keywords: Kyoto, trip, Alice, cherry-blossom, flight

Traditional systems search messages. Nemori searches events.


3. How Nemori Works (No PhD Required) {#how-nemori-works}

3.1 The Four-Layer Cake

graph TD
    A[Raw Data<br>chat / location / media] -->|Layer 1<br>Data Ingestion| B[Typed Event]
    B -->|Layer 2<br>Episode Builder| C[Episode Object]
    C -->|Layer 3<br>Indexing| D[BM25 Index]
    D -->|Layer 4<br>Retrieval| E[Answer]
  • Layer 1 normalizes any input into RawEventData.
  • Layer 2 chooses the right builder (conversation, location, etc.) and produces an Episode.
  • Layer 3 tokenizes, stems, and stores the episode in a per-user BM25 index.
  • Layer 4 answers questions by retrieving the top-k episodes and handing them to the LLM.

3.2 Two Prompts, Zero Embeddings

Step Prompt Output
Boundary Detection Detect episode boundaries along natural topic shifts. List of start/end message IDs
Episode Generation Summarize each segment into an episodic memory. Title, summary, keywords

That’s it. No vector databases, no re-rankers—unless you want them later.


4. Quick-Start: Run the LoCoMo Benchmark in 30 Minutes {#quick-start}

4.1 Prerequisites

  • Python 3.12+
  • uv package manager
  • One API key: OpenAI, Anthropic, or Google
git clone https://github.com/nemori-ai/nemori.git
cd nemori
uv sync
export OPENAI_API_KEY="sk-xxx"

4.2 One Script, End-to-End

from nemori.core.data_types import RawEventData, DataType, TemporalInfo
from nemori.llm.providers import OpenAIProvider
from nemori.builders.conversation_builder import ConversationEpisodeBuilder
from nemori.core.retrieval import RetrievalService, RetrievalStrategy
from datetime import datetime

# 1. Feed a conversation
messages = [
    {"user": "alice", "text": "I’m thinking Kyoto in April for the sakura."},
    {"user": "bot",  "text": "Great idea! Shall we book flights?"}
]
raw = RawEventData(
    data_type=DataType.CONVERSATION,
    content=messages,
    source="telegram",
    temporal_info=TemporalInfo(timestamp=datetime.utcnow())
)

# 2. Build episode
builder = ConversationEpisodeBuilder(llm_provider=OpenAIProvider.from_env())
episode = builder.build_episode(raw, for_owner="alice")

# 3. Index
service = RetrievalService(episode_repo)
service.register_provider(RetrievalStrategy.BM25, config={})
service.add_episode_to_all_providers(episode)

# 4. Query
query = RetrievalQuery(text="when sakura kyoto", owner_id="alice", limit=3)
results = service.search(query)
print(results.episodes[0].summary)
# → Alice proposed April dates for Kyoto cherry-blossom trip.

4.3 Verify on LoCoMo

cd evaluation
uv run pytest -m locomo

Expect state-of-the-art F1 without extra tuning.


5. Architecture at a Glance {#architecture-at-a-glance}

Layer Core Classes Responsibility
Data Ingestion RawEventData, DataType Accept anything—chat, GPS, sensor logs—into one schema.
Memory Processing EpisodeBuilder (abstract) + ConversationEpisodeBuilder Turn raw data into human-like episodes.
Episodic Memory Episode, EpisodeLevel (1-4) Store episodes with titles, summaries, keywords, timestamps, importance scores.
Storage & Retrieval RetrievalService + BM25RetrievalProvider Per-user BM25 index, automatic updates on create/update/delete.

All layers speak through clean interfaces; swap DuckDB for Postgres later without touching the rest.


6. Deep Dive: From Raw Chat to Searchable Episode {#deep-dive}

6.1 Step 1 – Store Raw Data (Immutable)

await raw_repo.store_raw_data(raw)   # never modified again

6.2 Step 2 – Detect Boundaries

The LLM receives the last N messages and returns [start_id, end_id] pairs.
Internal prompt (exact):

“Detect episode boundaries along natural topic shifts. Return JSON list of [start,end] pairs.”

6.3 Step 3 – Generate Narrative

Each pair is fed to:

“Summarize this segment into an episodic memory: who, what, when, why. 1–2 sentences.”

Example output:
Title: Alice and I Plan a Kyoto Cherry-Blossom Trip
Summary: On 2024-01-15 at 10:30, Alice suggested visiting Kyoto in early April for sakura; we agreed on JAL123 flights.

6.4 Step 4 – Index with BM25

  • Tokenize with NLTK (English)
  • Stem with Porter Stemmer
  • Fields & weights

    • title × 3
    • summary × 2
    • entities × 2
    • topics × 2
    • content × 1

Each user gets a separate index; no cross-talk.


7. Performance {#performance}

Benchmark Nemori Score Notes
LoCoMo F1 SOTA* Two prompts + BM25, no embeddings
Latency < 200 ms Per-user index fits in RAM

*Exact numbers in figures/locomo-scores.png inside repo.


8. Integration Cookbook {#integration-cookbook}

8.1 Embed in an Existing Chatbot

from nemori.core.managers import EpisodeManager

manager = EpisodeManager(
    raw_data_repo=MemoryRawDataRepository(),
    episode_repo=MemoryEpisodicMemoryRepository(),
    builder_registry=ConversationEpisodeBuilder(llm_provider=OpenAIProvider.from_env()),
    retrieval_service=RetrievalService(episode_repo)
)

# On every user message
await manager.process_raw_data(new_msg, owner_id=user_id)

# Before generating a reply
episodes = await manager.search_episodes("kyoto sakura", owner_id=user_id)
context = "\n".join(ep.summary for ep in episodes.episodes[:5])
prompt = f"Given these memories:\n{context}\nAnswer: {user_question}"

8.2 Batch Process Historical Logs

conversations = load_old_logs()
for conv in conversations:
    await manager.process_raw_data(conv, owner_id="alice")

8.3 Graceful Fallback if LLM Fails

If the OpenAI key is missing, Nemori falls back to a rule-based boundary detector and TF-IDF summary—accuracy drops ~5 % but keeps the bot alive.


9. FAQ: Engineers Ask These First {#faq}

Q1. Will long episodes bloat token usage?
Top-20 episodes ≈ 1.5 k tokens, similar to raw top-20 messages. Dropping to top-10 halves tokens with negligible loss.

Q2. What about details only in raw text?
We keep raw text untouched. Future “semantic memory” module will selectively fuse entities back into episodes—planned open-source.

Q3. Images or audio?
Architecture supports DataType.MEDIA. Pipeline: vision/audio → text summary → same episode flow.

Q4. Million-user scale?

  • Swap DuckDB → Postgres or sharded KV.
  • BM25 indexes are user-sharded; horizontal scale trivial.
  • Episode builders can be queued.

Q5. Why BM25 instead of embeddings?
LoCoMo is conversation-heavy; BM25 + clean narratives outperforms embeddings alone. Hybrid mode is one line away.


10. Roadmap {#roadmap}

Milestone ETA What You’ll See
Semantic Memory Plug-in Q3 2024 Inject names, dates, locations lost during summarization
Episode Clustering Q4 2024 Auto-merge similar episodes into long-term themes
Multi-modal 2025 Images, audio, sensor streams as first-class episodes
Federated Storage TBD Client-side encrypted indexes, server only holds pointers

Closing Thought

Nemori doesn’t try to remember everything.
It remembers what matters, the way humans do—events, not messages.

Clone it, plug it in, and give your AI the gift of a past.