Revolutionizing Semantic RAG: The Power of Knowledge Graph Traversal Algorithms

高效码农

2 months ago

Novel Knowledge Graph Traversal Algorithms: Enhancing Accuracy in Semantic Retrieval-Augmented Generation (RAG) Systems

In the fast-paced evolution of artificial intelligence, large language models (LLMs) have become indispensable tools for information processing. However, relying solely on an LLM’s internal knowledge often limits its ability to answer complex or domain-specific questions accurately. This is where Retrieval-Augmented Generation (RAG) systems shine—they supplement LLMs with context from databases or knowledge graphs, enabling more precise and well-grounded responses.

Yet traditional RAG systems have a critical limitation: they mostly rely on text matching in vector stores, which struggles to capture deep semantic connections between pieces of information. To address this gap, researchers have developed a suite of novel knowledge graph traversal algorithms. These algorithms “navigate” through knowledge graphs in meaningful ways, extracting information that is highly relevant to user queries. This article breaks down their principles, implementation steps, and real-world performance, helping you understand how knowledge graphs can elevate RAG system capabilities.

What Is a Semantic Retrieval-Augmented Generation (RAG) System?

At its core, a semantic RAG system combines information retrieval with generative AI. When a user asks a question, the system does not immediately let the LLM generate a response. Instead, it first retrieves relevant information from a pre-built knowledge base, then feeds this information as context to the LLM—ensuring the final answer is both accurate and supported by evidence.

The workflow of a traditional RAG system typically follows three steps:

Convert the user’s query into a vector using an embedding model.
Search a vector database to find text chunks most similar to the query vector.
Input these retrieved text chunks into the LLM to generate a response.

This approach resembles “fishing in the ocean”—it may miss information that has low surface-level similarity to the query but holds critical semantic relevance. In contrast, a knowledge graph-based RAG system builds a network of connections between information points. Its retrieval process mimics human thinking: starting from one knowledge point, it explores related content step by step, ultimately uncovering the most valuable information.

Knowledge Graph Architecture: Connecting Information for Seamless Navigation

To understand these new traversal algorithms, you first need to grasp how the underlying knowledge graph is structured. This graph uses a layered design that balances information granularity with efficient connectivity.

Step 1: From Text to Knowledge Graph—Chunking with Overlap

The first step in building the knowledge graph is processing raw documents. The research uses a three-sentence sliding window strategy for chunking: each basic unit contains 3 consecutive sentences, and the window shifts by 1 sentence at a time. This ensures overlapping content between adjacent chunks, preventing abrupt breaks in semantic connections.

Why is overlap important? Knowledge often spans sentence boundaries. For example, a passage about “machine learning history” might use one sentence to discuss “early 1950s explorations” and the next to describe “1960s algorithm breakthroughs.” Overlapping chunks preserve this logical link, ensuring the knowledge graph captures these relationships.

Step 2: Similarity Matrix—Mapping “Relationships” Between Chunks

After chunking, the system calculates the similarity between all pairs of chunks to create a similarity matrix. Think of this matrix as a “relationship map”: red indicates high semantic similarity between two chunks, while blue indicates low similarity.

Figure A: Similarity matrix for a Wikipedia “Neuroscience” document, showing cosine similarity between all three-sentence windows. Red diagonal entries represent a chunk’s perfect similarity to itself (value = 1), while blue areas indicate low relevance.

To save memory, the system does not store all similarity data. Instead, it only retains:

The top 10 most similar chunks for each chunk within the same document (intra-document similarity).
A set number of highly similar chunks for each chunk across different documents (inter-document similarity).

These high-similarity pairs form the “edges” of the knowledge graph, linking different information chunks together.

Step 3: Hierarchical Attributes—Adding “Structure” to Knowledge

The knowledge graph is organized into three layers, like an “information pyramid,” each serving a distinct purpose:

Document-level: Each document has 5 core themes extracted (e.g., a “machine learning” document might include “supervised learning” and “unsupervised learning” as themes). These themes are passed down to all chunks and sentences within the document.
Chunk-level: This layer stores high-similarity connections (both intra-document and inter-document) based on the similarity matrix.
Sentence-level: Each sentence acts as a child node of its parent chunk. Sentences only connect to their parent chunk, avoiding unnecessary complexity in the graph.

This structure ensures fine-grained information access (at the sentence level) while enabling broad knowledge integration through themes and similarity links.

Figure D: Multi-layer knowledge graph architecture. The document layer extracts themes, the chunk layer stores similarity connections, and the sentence layer acts as child nodes—creating a lightweight, traversable structure.

Knowledge Graph Traversal Algorithms: 7 Ways to Explore the Information Network

With a well-structured knowledge graph in place, the key challenge becomes “navigating” this network to find query-relevant information. The research introduces seven algorithms, each with a unique “exploration strategy.”

All algorithms start with the same step: identifying an anchor chunk—the chunk most similar to the user’s query. This anchor acts as the “starting point” for navigation. From here, each algorithm follows distinct rules to select the next chunk to explore.

1. basic_retrieval: The “Baseline” for Traditional RAG

This is the simplest algorithm, with no traversal logic—it serves as a benchmark to compare other algorithms against. Its workflow mirrors traditional RAG:

Convert the user’s query into a vector.
Directly search the knowledge graph for chunks most similar to the query vector.
Continue selecting the most similar chunks until 10–15 relevant sentences are collected, or until newly found chunks are less relevant than the sentences already extracted.

While straightforward, this method fails to explore connections in the knowledge graph. It may miss critical information that requires “multiple hops” from the anchor chunk.

2. query_traversal: Navigating with the Query as a “Compass”

This algorithm stays focused on the user’s query throughout navigation, like using a compass to stay on course:

Start from the anchor chunk.
At each step, select the adjacent chunk (either within the same document or across documents) that is most similar to the user’s query.
Extract sentences from each visited chunk and avoid revisiting the same chunk.
Stop traversal when 10–15 relevant sentences are collected, or when all unvisited chunks are less relevant than the sentences already extracted.

The advantage of this approach is its unwavering focus on the query, making it ideal for scenarios where precise alignment with the user’s intent is critical.

3. kg_traversal: Exploring “Local Relationships”

Unlike query_traversal, this algorithm acts like a “casual explorer”—it ignores the query and focuses only on the “neighbors” of the current chunk:

Start from the anchor chunk.
At each step, select the adjacent chunk that is most similar to the current chunk (relying solely on internal knowledge graph connections, not query similarity).
Skip chunks that contain sentences already extracted to avoid redundancy.
Stop traversal when the similarity of the next chunk is lower than the previous one (indicating a potential shift away from the topic) or when 10–15 relevant sentences are collected.

This method excels at uncovering “hidden connections” in the knowledge graph but risks straying from the original query and collecting irrelevant information.

4. triangulation_average: Balancing “Three Perspectives”

This algorithm acts like a “balanced decision-maker,” weighing three factors to select the next chunk: the query, the current chunk, and candidate chunks. Here’s how it works:

Start from the anchor chunk.
For each candidate chunk, calculate the average of three similarity scores:
- Similarity between the query and the current chunk.
- Similarity between the query and the candidate chunk.
- Similarity between the current chunk and the candidate chunk.
Select the candidate chunk with the highest average score as the next step.
Stop traversal when the average similarity of extracted sentences exceeds that of all candidate chunks.

By balancing these three perspectives, the algorithm maintains focus on the query while leveraging the knowledge graph’s internal connections. It is well-suited for scenarios requiring both precision and exploration.

5. triangulation_geometric_3d: Using “Spatial Position” to Locate Relevant Information

This algorithm maps high-dimensional vector embeddings to 3D space using PCA (Principal Component Analysis), turning complex information into a visual “3D map”:

Reduce the dimensionality of the query vector, current chunk vector, and candidate chunk vectors to 3D space.
Calculate the “centroid” (geometric center) of the triangle formed by these three 3D points.
Select the candidate chunk whose centroid is closest to the query point as the next step.

By using intuitive spatial distance to judge relevance, this method captures potential semantic connections that might be missed in high-dimensional space.

6. triangulation_fulldim: “Precise Calculation” with Full Information

Similar to the 3D geometric triangulation algorithm, this method skips dimensionality reduction and works directly in the original high-dimensional vector space (typically 1024 dimensions):

Calculate the centroid of the triangle formed by the query, current chunk, and candidate chunk using their original high-dimensional vectors.
Select the candidate chunk whose centroid is closest to the query vector.

This approach preserves all vector information, making it mathematically rigorous. It is ideal for scenarios requiring high precision, though it comes with higher computational costs.

7. llm_guided_traversal: Letting AI “Guide” the Journey

This algorithm uses a large language model as a “navigation expert” to plan the traversal path:

Start from the anchor chunk and identify the top 5 candidate chunks most similar to the query.
Provide the LLM with key information: the user’s query, a summary of already extracted sentences, previews of the 5 candidate chunks, and their similarity scores.
The LLM selects the next chunk to explore or decides to stop traversal based on this information.
If the LLM’s response is invalid (e.g., unclear or irrelevant), the system automatically falls back to the most similar candidate chunk.

This method offers the highest flexibility for complex semantic judgments but is slower and incurs additional API costs due to LLM calls.

How to Use These Algorithms: Step-by-Step Implementation

If you want to test these algorithms yourself, follow these steps to set up the environment and run the code.

1. Prepare the Environment

First, ensure your computer has Python 3.12 and Ollama (for running local LLMs) installed. Then execute the following commands in your terminal:

# Clone the code repository
git clone https://github.com/glacier-creative-git/knowledge-graph-traversal-semantic-rag-research
cd knowledge-graph-traversal-semantic-rag-research

# Create a virtual environment
python3.12 -m venv .venv 
source .venv/bin/activate  # Use .venv\Scripts\activate for Windows systems

# Install required dependencies
pip install -r requirements.txt

2. Configure API Keys

Create a .env file in the project root directory and add the necessary API keys (fill in only the keys for services you plan to use):

# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key

# Ollama Configuration (no API key needed for local runs)
OLLAMA_BASE_URL=http://localhost:11434

# Anthropic Configuration
ANTHROPIC_API_KEY=your-anthropic-api-key

# Evaluation Tool Configuration
CONFIDENT_AI_API_KEY=your-deepeval-api-key

Note: You need to register for a DeepEval account to obtain the CONFIDENT_AI_API_KEY, which is used to evaluate algorithm performance.

3. Run the Demonstration Notebook

Launch the interactive Jupyter Notebook to test the algorithms and visualize the knowledge graph:

jupyter notebook research_demonstration.ipynb

Within the notebook, you can visually track the traversal path of different algorithms. For example, you might see a path like this:

Figure B: Global traversal path of an LLM between “Machine Learning” and “Artificial Intelligence” documents. Starting from “Machine Learning,” the algorithm briefly jumps to “Artificial Intelligence” at step 5 before returning.

4. Run Benchmark Tests

To compare the performance of different algorithms, use the benchmark.py script:

# View help documentation
python benchmark.py --help

# Run tests (tests all 7 algorithms by default)
python benchmark.py

You can adjust test parameters (e.g., number of test questions, maximum sentences per algorithm) in the config.yaml configuration file.

Algorithm Performance: What the Test Results Show

To validate the effectiveness of these algorithms, researchers conducted tests using two synthetic datasets—one with 20 questions and another with 50 questions. The questions covered scenarios like reasoning, multi-context integration, and comparative queries. Four key metrics were used for evaluation:

Precision: How relevant the retrieved information is to the query.
Recall: Whether all necessary information is retrieved.
Answer Relevancy: How well the generated answer matches the query.
Faithfulness: How consistent the answer is with the retrieved information.

Test Results for the 20-Question Dataset (Theme-Focused)

Algorithm	Precision	Recall	Answer Relevancy	Faithfulness	Passed Tests
basic_retrieval	0.87	0.74	0.91	0.93	16/20
query_traversal	0.83	0.83	0.91	1.00	16/20
kg_traversal	0.73	0.72	0.98	0.92	15/20
triangulation_average	0.84	0.77	0.96	1.00	16/20
triangulation_geometric_3d	0.86	0.77	0.96	1.00	16/20
triangulation_fulldim	0.90	0.78	0.95	0.99	17/20
llm_guided_traversal	0.88	0.82	0.95	1.00	17/20

In this dataset (which emphasized cross-document theme connections), triangulation_fulldim and llm_guided_traversal performed best, with the most passed tests. triangulation_fulldim achieved the highest precision (0.90), while query_traversal balanced precision and recall equally (both 0.83).

Test Results for the 50-Question Dataset (Multi-Hop Sequences)

Algorithm	Precision	Recall	Answer Relevancy	Faithfulness	Passed Tests
basic_retrieval	0.93	0.88	0.99	0.99	48/50
query_traversal	0.91	0.91	0.98	1.00	50/50
kg_traversal	0.93	0.87	0.99	0.99	49/50
triangulation_average	0.92	0.87	0.98	0.99	49/50
triangulation_geometric_3d	0.93	0.85	0.98	1.00	48/50
triangulation_fulldim	0.93	0.87	1.00	0.97	47/50
llm_guided_traversal	0.91	0.94	0.99	0.99	49/50

In this more complex multi-hop scenario, query_traversal stood out by passing all 50 tests. It also balanced precision and recall (both 0.91) effectively. llm_guided_traversal achieved the highest recall (0.94), showing its strength in capturing comprehensive information.

Key Takeaways from the Tests

query_traversal’s Versatility: It performed well across both datasets and achieved a 100% pass rate in the multi-hop scenario. This makes it a reliable choice for most use cases.
The Value of LLM Guidance: llm_guided_traversal had the highest recall, making it ideal for scenarios where comprehensive information coverage is critical—though it requires more resources.
Precision of Triangulation Algorithms: The three triangulation algorithms (average, 3D geometric, full-dim) excelled in precision, making them suitable for applications where relevance is a top priority.
Limitations of kg_traversal: Its lower performance in both datasets highlights the risk of straying from the query when relying solely on internal knowledge graph connections.

Frequently Asked Questions (FAQ)

What Is the Fundamental Difference Between These Algorithms and Traditional RAG?

Traditional RAG uses “one-time retrieval”—it directly finds the most similar text chunks in a vector database. These algorithms, by contrast, use “exploratory retrieval”: starting from an anchor chunk, they navigate the knowledge graph’s connections to uncover deep, multi-hop information. This allows them to capture semantic relationships that traditional RAG misses.

How Do I Choose the Right Algorithm for My Needs?

Choose basic_retrieval if you want simplicity and efficiency.
Choose query_traversal if you need a balance of precision and exploration.
Choose llm_guided_traversal if comprehensive information coverage is critical (and you can accept higher costs).
Choose triangulation_fulldim if you prioritize maximum precision.

Does the Size of the Knowledge Graph Affect Algorithm Performance?

Yes. The tests used a small knowledge graph with 10 documents. For larger graphs (with hundreds or thousands of documents):

kg_traversal may be more likely to stray from the query.
query_traversal may maintain more stability due to its focus on the query.
Additional optimizations (e.g., limiting hop count) may be needed to ensure efficiency.

Which Industries Can Benefit from These Algorithms?

These algorithms are particularly useful in fields that require deep semantic understanding:

Academic Research: They can quickly connect related concepts across multiple papers.
Intelligent Q&A Systems: They enable more accurate and comprehensive responses to user questions.
Information Analysis: They help extract key connections from diverse document sources.

Do These Algorithms Require Powerful Computing Resources?

It depends on the algorithm:

Basic algorithms like query_traversal and triangulation_average work well on standard computers.
triangulation_fulldim requires more CPU power due to high-dimensional vector calculations.
llm_guided_traversal speed depends on the LLM—running a local model (e.g., Llama 3.1:8b via Ollama) is a cost-effective option.

Conclusion: The Value and Future of Knowledge Graph Traversal Algorithms

These novel knowledge graph traversal algorithms offer a more flexible and intelligent approach to information retrieval for semantic RAG systems. By navigating the knowledge graph’s connections, they overcome the limitations of traditional RAG and capture deep semantic relationships—providing LLMs with more accurate and comprehensive context.

Among the seven algorithms, query_traversal stands out as the most versatile, while llm_guided_traversal and the triangulation algorithms excel in specific areas (recall and precision, respectively). As AI continues to evolve, these algorithms can be further optimized:

Adding “path prediction” to plan multi-step traversal in advance.
Adapting to larger knowledge graphs with millions of documents.
Fine-tuning for domain-specific needs (e.g., healthcare, finance).

If you are building a RAG system or working with knowledge graphs, these algorithms provide a powerful way to enhance performance. By balancing precision, coverage, and efficiency, they help AI systems think more like humans—focusing not just on surface-level similarity, but on the deep connections that make information meaningful.