Making Sense of Long Stories: How ComoRAG Lets AI “Read a Novel Like a Human”
Imagine finishing a 200,000-word novel and being asked, “Why did Snape kill Dumbledore?”
You would flip back several chapters, connect scattered clues, and build a coherent picture.
ComoRAG does exactly that—turning one-shot retrieval into iterative reasoning and turning scattered facts into a working memory.
Table of Contents
-
What is ComoRAG? -
Why Classic RAG Struggles with Long Narratives -
The Three Pillars of ComoRAG -
End-to-End Walk-Through: Eight Steps from Query to Answer -
Hard Numbers: Four Benchmarks, Clear Wins -
Hands-On Guide: 30-Minute Local Demo -
Frequently Asked Questions -
One-Line Takeaway
1. What Is ComoRAG?
In plain English:
ComoRAG is a retrieval–augmented system inspired by the human prefrontal cortex. It answers complex questions over ultra-long documents (200 k+ tokens) while keeping a living memory that updates as new evidence arrives.
Term | Everyday Analogy |
---|---|
RAG | Look-up, then answer |
Stateful | Keeps “mental notes” and revises them |
Memory-Organized | Stores every clue in a shared “notebook” for later use |
2. Why Classic RAG Struggles with Long Narratives
Traditional Limitation | Symptom | Concrete Example |
---|---|---|
Single-step retrieval | Reads once, then stops | Sees “Snape kills Dumbledore” but misses the seven-book-long back-story |
Stateless | Each search forgets the last | Cannot reconcile “Snape protects Harry” with “Snape bullies Harry” |
Fixed-size chunks | Slices text into 512-token bits | A critical clue spanning three pages is split across chunks and lost |
ComoRAG replaces the single pass with a loop of self-questioning, retrieval, note-taking, and re-evaluation—just like a human re-reading chapters until the story clicks.
3. The Three Pillars of ComoRAG
3.1 Hierarchical Knowledge Source: Three Ways to Read the Same Book
Layer | Purpose | Human Parallel |
---|---|---|
Veridical | Exact sentences | Highlighting the original line |
Semantic | Chapter-level summaries | Teacher’s one-paragraph recap |
Episodic | Timeline of events | Sticky-note timeline on the wall |
3.2 Dynamic Memory Workspace
-
Every retrieval creates a memory unit = probing question + evidence + one-sentence cue. -
All units are stored in a global pool that later iterations can consult.
3.3 Metacognitive Control Loop
Five actions repeat until the answer is solid (max five cycles):
-
Self-Probe – “What am I still missing?” -
Tri-Retrieve – Search all three layers at once. -
Mem-Encode – Write a new memory unit. -
Mem-Fuse – Merge fresh clues with earlier notes. -
Try-Answer – Attempt a final response; if unsure, loop back to step 1.
4. End-to-End Walk-Through: Eight Steps from Query to Answer
Real example taken from a detective story:
Question:
Mrs. MacIntyre never writes letters. Why does she suddenly buy ink?
Choices:
A) Reply to a government letter
B) Send a birthday card to her niece
C) Write to the Sunday Comet newspaper
D) Blur some photos
Step | What Happens | Internal Log Snippet |
---|---|---|
1 | Initial retrieval with the raw question | Finds “a newspaper page with a section cut out” |
2 | First answer attempt | Fails—clue is too vague |
3 | Self-probe triggered | Generates new probe: “Who wrote to the Sunday Comet?” |
4 | Second retrieval | Retrieves “Miss Hosford recalls receiving a vague letter about a photograph” |
5 | Encode new memory | Cue: “Mrs. MacIntyre plans to sell a story to the paper” |
6 | Fuse with prior note | Combines “cut-out newspaper” + “letter to paper” into a coherent motive |
7 | Second answer attempt | Chooses C—correct |
8 | Terminate & save | Logs final answer and all memory units for later inspection |
5. Hard Numbers: Four Benchmarks, Clear Wins
Dataset | Task | Strongest Baseline | ComoRAG | Relative Gain |
---|---|---|---|---|
NarrativeQA | Free-form F1 | 31.35 | 31.43 | +0.3 % |
EN.QA (200 k+) | Free-form F1 | 32.09 | 34.52 | +7.6 % |
EN.MC (200 k+) | Multiple-choice accuracy | 64.27 | 72.93 | +13.5 % |
DetectiveQA | Multiple-choice accuracy | 64.77 | 70.56 | +8.9 % |
The edge widens on documents above 150 k tokens, peaking at +24.6 % accuracy.
5.1 Ablation Study: What Hurts Most When Removed?
Ablation | EN.MC Accuracy Drop | Human Analogy |
---|---|---|
Remove Veridical layer | –30 % | Reading only summaries, never the original text |
Remove Metacognition (memory fusion) | –15 % | Taking no notes—each search starts from scratch |
Remove Regulation (self-probing) | –24 % | Never re-asking; sticking with the first shallow query |
6. Hands-On Guide: 30-Minute Local Demo
6.1 Environment & Installation
# 1. Clone
git clone https://github.com/EternityJune25/ComoRAG.git
cd ComoRAG
# 2. Install
pip install -r requirements.txt
# 3. Quick-check (built-in Cinderella sample)
6.2 Two Ways to Run
Mode | Who It’s For | Command |
---|---|---|
OpenAI API | No GPU, pay-as-you-go | python main_openai.py |
Local vLLM | Has GPU, full privacy | python main_vllm.py (after starting server) |
6.3 Core Configuration in One Snippet
BaseConfig(
llm_name='gpt-4o-mini', # or your vLLM model
dataset='cinderella', # sample included
need_cluster=True, # enables all three layers
max_meta_loop_max_iterations=5 # safety stop
)
6.4 Starting the vLLM Server (if local)
# Example for single-GPU
vllm serve /path/to/your/model \
--tensor-parallel-size 1 \
--max-model-len 4096 \
--gpu-memory-utilization 0.95
Results land in result/cinderella/
:
-
details.jsonl
– every probe and memory unit -
results.json
– final answer and score
7. Frequently Asked Questions
7.1 Is ComoRAG Limited to Fiction?
No. Any ultra-long document—legal contracts, technical manuals, research papers—works as long as global context is required.
7.2 How Much VRAM Do I Need?
-
7 B model + BGE-M3 embedding: 24 GB GPU is comfortable. -
Using the OpenAI API removes the local GPU requirement.
7.3 Which Embedding Model Is Used?
-
Default: BGE-M3 (0.3 B parameters). -
Paper shows it outperforms 8 B alternatives when paired with ComoRAG’s loop.
7.4 Isn’t Five Iterations Slow?
-
Median: 2–3 iterations. -
P90: still within five.
Most questions resolve quickly; the loop is a safety net.
7.5 Can I Plug ComoRAG Into Other RAG Pipelines?
Yes. Authors grafted the same loop onto RAPTOR and HippoRAGv2, boosting their accuracy by 8–12 % with zero architecture changes.
8. One-Line Takeaway
ComoRAG turns “read once, answer once” into read, reflect, refine, and resolve—giving AI the same iterative, note-taking habit that humans use to untangle long stories.
Appendix: Quick Links
-
GitHub: https://github.com/EternityJune25/ComoRAG -
Paper: https://arxiv.org/abs/2508.10419