Building a Vectorless RAG System: Hierarchical Page Indexing Without Embeddings
Core question this article answers: Can we build an effective retrieval-augmented generation system without vector embeddings, similarity search, or vector databases?
Yes. By structuring documents as navigable trees and using LLM reasoning to traverse them, we can retrieve relevant context through hierarchical decision-making rather than mathematical similarity. This approach mirrors how humans actually search through documents—using tables of contents and section headings rather than comparing every paragraph’s semantic meaning.
Why Consider a Vectorless Approach?
Core question: What problems does traditional vector-based RAG create that motivate alternative architectures?
Traditional RAG systems rely on embedding models and vector databases. This introduces infrastructure complexity: you must maintain a separate vector store, manage embedding model versions, and handle the computational cost of encoding queries and documents into high-dimensional vectors. More fundamentally, similarity search operates as a black box—retrieval decisions become opaque numerical operations that resist intuitive explanation.
Human information-seeking works differently. When you need to find something in a textbook, you don’t mentally compare your question against every page’s semantic fingerprint. You open the table of contents, identify the relevant chapter, scan section headings, and navigate directly to the appropriate content. This structured, reasoning-based navigation is precisely what hierarchical page indexing replicates.
Author’s reflection: The elegance of this approach lies in its transparency. Every retrieval decision is inspectable—you can see exactly which summaries the LLM considered and why it chose a particular branch. In production systems where debugging retrieval failures matters, this interpretability proves invaluable compared to tuning similarity thresholds on opaque vector distances.
System Architecture Overview
Core question: How does data flow through a PageIndex system from document ingestion to answer generation?
The architecture separates cleanly into two phases: index construction (performed once per document) and query retrieval (performed per question).
Index Construction Phase
During indexing, the system transforms a linear document into a hierarchical tree structure, then populates each node with descriptive summaries.
Query Retrieval Phase
During querying, the system navigates this tree using LLM reasoning to locate the most relevant leaf node, then generates an answer from that retrieved context.
Application scenario: Consider a legal research platform managing thousands of court opinions and statutes. Traditional vector RAG might retrieve semantically similar but legally irrelevant passages—discussing “contract termination” when the query concerns “employment dismissal” because the embeddings capture overlapping vocabulary. The hierarchical approach forces the LLM to reason about document structure: first selecting “Labor Law” over “Commercial Law” at the root level, then “Termination Procedures” over “Benefits Administration,” finally retrieving the specific statutory text governing employee dismissal. Each decision is explicit and auditable.
Project Structure and Setup
Core question: What codebase organization supports building and running a PageIndex system?
A modular Python package structure keeps concerns separated and testable:
pageindex-rag/
pageindex/
__init__.py
node.py # Core data structure
parser.py # Document segmentation
indexer.py # Summary generation
retriever.py # Tree navigation
storage.py # Persistence layer
main.py # CLI interface
document.md # Source document
Initialize the project with standard shell commands:
mkdir pageindex-rag
cd pageindex-rag
mkdir pageindex
touch pageindex/__init__.py
Operational example: This structure supports both research prototyping and production deployment. The pageindex package can be installed via pip in requirements files, while main.py provides immediate command-line utility. The separation between parser.py and indexer.py allows teams to experiment with different segmentation strategies without modifying summary generation logic—useful when adapting the system for domain-specific document formats like legal briefs or medical case reports.
Core Data Structure: The PageNode
Core question: How do we represent document sections to support both hierarchical navigation and content retrieval?
The PageNode dataclass captures four essential attributes: a descriptive title, the raw text content (populated only at leaf nodes), an LLM-generated summary, and references to parent and child nodes establishing the tree topology.
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class PageNode:
title: str
content: str # Raw text, populated at leaves
summary: str # LLM-generated description
depth: int # 0 = root, 1 = chapter, 2 = section
children: list = field(default_factory=list)
parent: Optional["PageNode"] = None
def is_leaf(self) -> bool:
return len(self.children) == 0
Design rationale: Separating content from summary enables efficient navigation. Internal nodes carry summaries that describe their subtree without loading full text, keeping navigation prompts concise. Leaf nodes store the actual content retrieved for answer generation. The depth field supports visualization and debugging, while bidirectional parent references enable upward traversal if needed for context expansion.
Application scenario: In a technical documentation system, a root node might represent an entire API reference manual. Its children could include “Authentication,” “Endpoints,” and “Error Handling” chapters. The “Endpoints” node, being lengthy, becomes an internal node with children like “Users API,” “Orders API,” and “Inventory API.” Each leaf node contains the actual endpoint specifications—HTTP methods, parameters, response schemas—while every node carries a summary enabling rapid navigation without loading full specifications.
Document Parsing: From Linear Text to Hierarchical Tree
Core question: How does the system convert a flat document into a navigable tree structure without predetermined section boundaries?
The parsing strategy employs two-pass LLM-driven segmentation. First, the complete document is split into top-level chapters. Then, any chapter exceeding a length threshold undergoes recursive segmentation into subsections.
Segmentation Implementation
import json
import openai
from .node import PageNode
client = openai.OpenAI()
SUBSECTION_THRESHOLD = 300 # words
def _segment(text: str) -> list:
prompt = f"""Split the following text into logical sections.
Return a JSON object with a "sections" key. Each item has:
- "title": short title (5 words or less)
- "content": the text belonging to this section
Text:
{text[:8000]}"""
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
max_completion_tokens=3000,
response_format={"type": "json_object"},
)
parsed = json.loads(response.choices[0].message.content)
return parsed.get("sections", [])
Tree Construction
def parse_document(text: str) -> PageNode:
root = PageNode(title="root", content="", summary="", depth=0)
for item in _segment(text):
title = item.get("title", "Section")
content = item.get("content", "")
node = PageNode(title=title, content="", summary="", depth=1)
node.parent = root
word_count = len(content.split())
if word_count > SUBSECTION_THRESHOLD:
subsections = _segment(content)
if len(subsections) > 1:
for sub in subsections:
child = PageNode(
title=sub.get("title", "Subsection"),
content=sub.get("content", ""),
summary="",
depth=2,
)
child.parent = node
node.children.append(child)
else:
node.content = content # Splitting yielded nothing useful
else:
node.content = content # Short enough to remain a leaf
root.children.append(node)
return root
Operational example: Processing a 40-page employee handbook, the initial segmentation might identify “Company Overview,” “Benefits,” “Remote Work Policy,” and “Performance Reviews.” The “Benefits” chapter, at 12 pages, exceeds the 300-word threshold and gets submitted for secondary segmentation, producing “Health Insurance,” “Retirement Plans,” “Paid Time Off,” and “Professional Development.” Each of these becomes a depth-2 leaf node with specific policy details. “Company Overview,” being only 2 pages, remains a depth-1 leaf. This adaptive approach ensures granular navigation where content density demands it, while avoiding unnecessary fragmentation of brief sections.
Author’s reflection: The 300-word threshold represents a pragmatic compromise. Too low, and you fragment coherent discussions into arbitrary pieces; too high, and leaf nodes become unwieldy for context windows. In practice, this parameter requires domain-specific tuning—legal documents often benefit from lower thresholds to preserve precise clause boundaries, while narrative content might tolerate higher thresholds. The ability to adjust this single constant without rearchitecting the system demonstrates the flexibility of the approach.
Summary Generation: Bottom-Up Content Distillation
Core question: How do we create informative descriptions for every node to support navigation decisions?
Summary generation proceeds post-order (children before parents), ensuring parent summaries synthesize child descriptions rather than regenerating content from scratch.
import openai
from .node import PageNode
client = openai.OpenAI()
def _summarize(text: str, section_name: str = "") -> str:
hint = f"This is the section titled: {section_name}.\n" if section_name else ""
prompt = f"""{hint}Summarize the following in 2-3 sentences. Be specific and factual. Do not add anything not in the text.
{text[:3000]}"""
response = client.chat.completions.create(
model="gpt-4-mini",
messages=[{"role": "user", "content": prompt}],
max_completion_tokens=150,
)
return response.choices[0].message.content.strip()
def build_summaries(node: PageNode):
# Post-order: process children first
for child in node.children:
build_summaries(child)
if node.is_leaf():
if node.content.strip():
node.summary = _summarize(node.content, node.title)
else:
node.summary = "(empty section)"
else:
# Synthesize parent summary from children
children_text = "\n\n".join(
f"[{c.title}]: {c.summary}" for c in node.children
)
node.summary = _summarize(children_text, node.title)
Operational example: In a software documentation tree, a leaf node describing the authentication endpoint might generate the summary: “Describes OAuth 2.0 flow implementation, token expiration policies, and refresh mechanisms for API access.” Its parent node “Authentication” synthesizes from this and sibling nodes: “Covers OAuth 2.0 implementation, API key alternatives, and security best practices for credential management.” The root node further abstracts: “Complete API documentation covering authentication mechanisms, 40+ endpoints organized by resource type, error handling, and rate limiting policies.” This hierarchical distillation enables rapid relevance assessment at any granularity.
Author’s reflection: The post-order traversal elegantly solves the dependency problem—parents cannot summarize what they haven’t yet characterized. This mirrors how good technical writing itself works: you understand the components before describing the system. I’ve observed that the quality of parent summaries heavily depends on child summary precision; a vague child summary propagates upward, potentially misleading navigation decisions. The constraint to “2-3 sentences” forces conciseness that, while sometimes frustrating for complex topics, proves essential for keeping navigation prompts within context limits.
Index Persistence: Serialize Once, Query Often
Core question: How do we avoid rebuilding the document tree for every query session?
JSON serialization captures the complete tree structure, enabling one-time construction and persistent reuse.
import json
from .node import PageNode
def save(node: PageNode, path: str):
def to_dict(n: PageNode) -> dict:
return {
"title": n.title,
"content": n.content,
"summary": n.summary,
"depth": n.depth,
"children": [to_dict(c) for c in n.children],
}
with open(path, "w") as f:
json.dump(to_dict(node), f, indent=2)
def load(path: str) -> PageNode:
def from_dict(d: dict) -> PageNode:
node = PageNode(
title=d["title"],
content=d["content"],
summary=d["summary"],
depth=d["depth"],
)
for child_dict in d["children"]:
child = from_dict(child_dict)
child.parent = node
node.children.append(child)
return node
with open(path) as f:
return from_dict(json.load(f))
Application scenario: A customer support knowledge base containing 500 help articles can be processed overnight into indexed trees, stored as JSON files in object storage (S3, GCS, Azure Blob). Support agents query against these pre-built indexes throughout the day, with sub-second response times limited only by the LLM API latency for navigation decisions. When articles update, only the affected trees require reconstruction—incremental updates can be engineered by subtree replacement.
Retrieval: LLM-Guided Tree Navigation
Core question: How does the system locate relevant content without computing similarity scores?
Retrieval transforms into a sequence of classification decisions. At each internal node, the LLM examines child summaries and selects the most promising branch, repeating until reaching a leaf.
import openai
from .node import PageNode
client = openai.OpenAI()
def _pick_child(query: str, node: PageNode) -> PageNode:
options = "\n".join(
f"{i + 1}. [{c.title}]: {c.summary}"
for i, c in enumerate(node.children)
)
prompt = f"""You are navigating a document tree to find the answer to a question.
Current section: "{node.title}"
Question: {query}
Children of this section:
{options}
Which child section most likely contains the answer? Reply with only the number."""
response = client.chat.completions.create(
model="gpt-4-mini",
messages=[{"role": "user", "content": prompt}],
max_completion_tokens=5,
)
try:
index = int(response.choices[0].message.content.strip()) - 1
return node.children[index]
except (ValueError, IndexError):
return node.children[0] # Fallback to first child
def retrieve(query: str, root: PageNode) -> str:
node = root
while not node.is_leaf():
if not node.children:
break
node = _pick_child(query, node)
return node.content
Operational example: A user asks: “How do I reset my password if I forgot my email access?”
At the root node, the LLM sees options: “Account Management” (summary: Creating, deleting, and recovering user accounts), “Billing” (summary: Payment methods, invoicing, and subscription changes), “Technical Issues” (summary: Troubleshooting login problems, browser compatibility, and error messages). It selects option 1.
At the “Account Management” node, children include: “Creating Accounts,” “Password Reset,” and “Account Deletion.” The LLM selects option 2.
If “Password Reset” is a leaf, its content—detailing the email-less recovery process using phone verification or security questions—is returned. If it has children (“Email Recovery,” “Phone Recovery,” “Security Questions”), the LLM continues navigating until reaching the specific procedure matching the query’s constraints.
Author’s reflection: This navigation paradigm shifts retrieval from geometric proximity to logical relevance. Traditional similarity search might return a “Billing” passage mentioning “password” in a different context (e.g., “protect your payment password”) due to token overlap, while the hierarchical approach forces structural coherence—the LLM must commit to high-level categories before accessing details. The fallback to first child on parse errors provides graceful degradation, though monitoring these fallbacks reveals opportunities for prompt engineering or summary improvement.
Integration: Complete Application Flow
Core question: How do we orchestrate indexing and querying into a cohesive application?
The main module coordinates the pipeline, separating one-time construction from repeated querying:
import os
from pageindex.parser import parse_document
from pageindex.indexer import build_summaries
from pageindex.retriever import retrieve
from pageindex import storage
import openai
client = openai.OpenAI()
INDEX_PATH = "index.json"
def build_index(doc_path: str):
print("Parsing document...")
text = open(doc_path).read()
tree = parse_document(text)
print("Building summaries (this makes LLM calls)...")
build_summaries(tree)
print(f"Saving index to {INDEX_PATH}")
storage.save(tree, INDEX_PATH)
return tree
def ask(query: str) -> str:
if not os.path.exists(INDEX_PATH):
raise FileNotFoundError("Index not found. Run build_index() first.")
tree = storage.load(INDEX_PATH)
context = retrieve(query, tree)
response = client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"Answer using only the context below.\n\nContext:\n{context}\n\nQuestion: {query}"
}],
max_completion_tokens=500,
)
return response.choices[0].message.content.strip()
if __name__ == "__main__":
# First run: build the index
build_index("document.md")
# Subsequent queries
print(ask("Your Question"))
Application scenario: A documentation team maintains a growing repository of Markdown files. They configure a CI/CD pipeline that triggers build_index() whenever documents merge to the main branch, storing the resulting JSON in a versioned artifact store. The production application loads the appropriate index version based on environment configuration, ensuring queries always reference current documentation while maintaining the ability to rollback if issues emerge.
Example Index Structure
After processing a sample policy document, the generated index.json reveals the hierarchical organization:
{
"title": "root",
"summary": "Document covers returns, shipping options, and account setup.",
"content": "",
"depth": 0,
"children": [
{
"title": "Returns and Refunds",
"summary": "Refunds are processed within 14 days of receiving the returned item.",
"content": "We accept returns within 30 days...",
"depth": 1,
"children": []
},
{
"title": "Shipping Options",
"summary": "Covers domestic (3-5 days) and international shipping (7-14 days).",
"content": "",
"depth": 1,
"children": [
{
"title": "Domestic Shipping",
"summary": "Standard delivery takes 3-5 business days via USPS.",
"content": "We ship domestically via USPS...",
"depth": 2,
"children": []
},
{
"title": "International Shipping",
"summary": "International orders ship via DHL and arrive in 7-14 days.",
"content": "International shipping is available to 50+ countries...",
"depth": 2,
"children": []
}
]
},
{
"title": "Account Setup",
"summary": "Instructions for creating and verifying a new account.",
"content": "To create an account, visit...",
"depth": 1,
"children": []
}
]
}
Structural observations: “Returns and Refunds” and “Account Setup” remain at depth 1 as leaves—their content is concise enough to serve as retrieval units directly. “Shipping Options” becomes an internal node at depth 1 with two depth-2 children, reflecting its greater complexity and the value of subcategorization by shipping type. The root summary captures the document’s full scope without storing any direct content, serving purely as a navigation entry point.
Troubleshooting Common Issues
Core question: What operational challenges emerge when deploying PageIndex, and how do we address them?
| Symptom | Root Cause | Resolution |
|---|---|---|
| LLM consistently selects wrong branches | Summaries are too vague or generic | Use stronger models for summarization, or enhance prompts to demand specificity |
| Segmentation cuts through important context | Document complexity exceeds single-call capacity | Increase max_tokens in segmentation calls, or pre-chunk documents into ~3000-word segments |
| Leaf nodes exceed context window limits | Subsection threshold too permissive | Reduce SUBSECTION_THRESHOLD to force deeper segmentation |
| Slow index construction | Large document volume or API latency | Implement parallel processing of independent chapters, or add caching for repeated segments |
| High query latency | Deep trees requiring many navigation steps | Optimize tree depth (target 2-3 levels), or use faster models for navigation decisions |
| Inconsistent answer quality | Retrieved leaf content insufficient | Add parent context inclusion, or implement multi-leaf retrieval for complex queries |
Author’s reflection: The most persistent challenge I’ve encountered is summary quality drift. As documents grow larger and more heterogeneous, the compression into 2-3 sentences inevitably loses nuance. One mitigation is hierarchical query expansion: if navigation confidence is low at a node, retrieve multiple children rather than committing to one. This trades latency for recall, a worthwhile exchange in high-stakes applications like medical or legal research where missing relevant content carries significant consequences.
Action Checklist / Implementation Steps
Ready to deploy PageIndex for your document collection?
- •
[ ] Prepare source documents: Ensure content is in accessible text format (Markdown, plain text, or extracted PDF text) - •
[ ] Install dependencies: pip install openaiand configure API key environment variables - •
[ ] Initialize project structure: Create module directories and copy implementation files - •
[ ] Tune segmentation threshold: Adjust SUBSECTION_THRESHOLDbased on your document characteristics (start with 300, iterate based on leaf node sizes) - •
[ ] Select appropriate models: Use capable models (GPT-4 class) for initial parsing; lighter models (GPT-4-mini) for summaries and navigation to optimize costs - •
[ ] Build initial index: Run build_index()on representative documents and inspect output structure - •
[ ] Validate retrieval paths: Test sample queries, tracing navigation decisions to identify summary weaknesses - •
[ ] Implement monitoring: Log navigation choices and fallback frequencies to detect systematic retrieval failures - •
[ ] Deploy persistence layer: Move from local JSON to production storage (S3, database, or vector store for metadata if hybridizing) - •
[ ] Establish update workflows: Automate index reconstruction when source documents change
One-Page Overview
PageIndex is a vectorless RAG architecture that replaces similarity search with hierarchical tree navigation. Documents are parsed into trees using LLM-driven segmentation, with each node receiving a generated summary. Queries are answered by having an LLM navigate the tree—reading child summaries and selecting branches until reaching a leaf node whose content is used for answer generation.
Key characteristics:
- •
No embeddings required: Eliminates vector database infrastructure and embedding model dependencies - •
Interpretable retrieval: Every navigation decision is inspectable and auditable - •
Structure-aware: Respects document organization rather than treating content as unordered bags of text - •
Cost profile: Higher upfront indexing cost (multiple LLM calls), lower per-query cost than high-dimensional vector search at scale - •
Best suited for: Structured documents (manuals, policies, documentation), compliance-sensitive applications requiring retrieval explainability, environments lacking vector infrastructure
Trade-offs: Retrieval latency depends on tree depth and LLM API round-trips; requires well-structured source documents to form meaningful trees; initial indexing is computationally intensive compared to simple chunking strategies.
Frequently Asked Questions
Q1: Does PageIndex replace vector RAG entirely, or complement it?
PageIndex serves as a full alternative for structured documents, but can also hybridize with vector approaches—using hierarchical navigation for coarse retrieval then similarity search within selected subtrees for fine-grained matching.
Q2: What document types work best with this approach?
Documents with inherent hierarchical organization: technical manuals, legal codes, policy documents, textbooks, structured reports. Stream-of-consciousness content (unstructured notes, transcripts, narrative prose) may not segment cleanly.
Q3: How does this handle very large document collections (thousands of documents)?
For collections exceeding single-tree capacity, implement a two-tier system: a top-level index treating each document as a leaf, with individual document trees loaded on demand after initial document selection.
Q4: Can the system recover if the LLM navigates to the wrong branch?
Currently, navigation commits to single paths. For robustness, implement beam search—maintaining multiple candidate branches with confidence scores—or allow backtracking if leaf content relevance scores (judged by a separate LLM call) fall below threshold.
Q5: How do we handle document updates without full reindexing?
Design subtree replacement: identify which sections changed, re-parse only affected branches, and graft updated subtrees into the existing tree structure. This requires maintaining stable node identifiers across versions.
Q6: Why use JSON rather than a database for storage?
JSON provides human readability for debugging, version control compatibility, and zero-configuration deployment. Production systems may migrate to document stores (MongoDB) or graph databases (Neo4j) for concurrent access and scaling.
Q7: What is the optimal tree depth for most applications?
Two to three levels (root → chapters → sections) handles most documents efficiently. Deeper trees increase navigation latency without proportional retrieval quality gains for typical content densities.
Q8: How do we validate that the system retrieves the correct content?
Implement evaluation frameworks that trace queries to expected leaf nodes, measuring navigation accuracy. Log actual paths taken for production queries and periodically audit cases where user feedback indicates poor retrieval for summary improvement.

