RAGentA: A Multi-Agent Retrieval-Augmented Generation Framework
In an age when information overload can overwhelm users and systems alike, delivering accurate, comprehensive, and traceable answers is a critical challenge. RAGentA (Retrieval-Augmented Generation Agent) rises to this challenge with a unique multi-agent design, hybrid retrieval methods, and rigorous citation tracking, ensuring that each answer is both relevant and grounded in real sources.
Table of Contents
-
-
-
Prerequisites and Installation
-
Environment Setup -
Repository Clone & Dependencies -
AWS Credentials & Environment Variables
-
-
-
Single-Question Mode -
Batch-Processing Mode
-
-
-
-
Agent 1: Predictor -
Agent 2: Judge -
Agent 3: Final-Predictor -
Agent 4: Claim Judge
-
-
-
-
-
-
-
-
-
-
-
Introduction
RAGentA is designed for trustworthy, attributable question answering. By combining multiple specialized agents, semantic and keyword retrieval methods, and built-in citation tracking, RAGentA ensures:
-
Comprehensive coverage: Answers address every component of a query -
High relevance: Responses are strictly grounded in retrieved documents -
Full traceability: Citations link back to each source used
Whether you need factual definitions, technical explanations, or nuanced insights, RAGentA’s architecture delivers clear, reliable answers.
Key Features
-
Multi-Agent Architecture: Separate agents handle retrieval, relevance scoring, generation, and claim analysis. -
Hybrid Retrieval: Seamlessly blends dense (semantic) search with sparse (keyword) search. -
Citation Tracking: Automatically labels every fact with [X]
citations. -
Claim-Level Analysis: Breaks answers into individual claims to detect gaps. -
Follow-Up Processing: Generates and answers follow-up questions for any missing parts. -
Standard Evaluation: Built-in support for MRR, Recall, Precision, and F1 metrics.
Prerequisites and Installation
Environment Setup
-
OS: Linux, macOS, or Windows -
Python: 3.8 or higher -
Deep Learning: PyTorch ≥ 2.0.0 -
GPU: CUDA-compatible (recommended) -
Cloud Services: AWS OpenSearch and Pinecone accounts
Repository Clone & Dependencies
git clone git@github.com:tobiasschreieder/LiveRAG.git
cd LiveRAG
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
pip install -r requirements.txt
AWS Credentials & Environment Variables
-
Create the credentials directory:
mkdir -p ~/.aws
-
~/.aws/credentials:
[sigir-participant] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEY
-
~/.aws/config:
[profile sigir-participant] region = us-east-1 output = json
-
Add environment variables:
export AWS_PROFILE=sigir-participant export AWS_REGION=us-east-1 export HUGGING_FACE_HUB_TOKEN=your_hf_token
Quick Start
Single-Question Mode
python run_RAGentA.py \
--model tiiuae/Falcon3-10B-Instruct \
--n 0.5 \
--alpha 0.65 \
--top_k 20 \
--single_question "What is a knowledge graph?"
Batch-Processing Mode
python run_RAGentA.py \
--model tiiuae/Falcon3-10B-Instruct \
--n 0.5 \
--alpha 0.65 \
--top_k 20 \
--data_file questions.jsonl \
--output_format jsonl
Parameter reference:
Parameter | Description | Default |
---|---|---|
--model |
Model name or path | tiiuae/falcon-3-10b-instruct |
--n |
Adaptive threshold factor | 0.5 |
--alpha |
Weight for semantic vs. keyword scoring | 0.65 |
--top_k |
Number of documents to retrieve | 20 |
--data_file |
Input JSON/JSONL file of questions | — |
--single_question |
Single question text | — |
--output_format |
Output format: json, jsonl, or debug | jsonl |
--output_dir |
Directory for saving results | results |
System Architecture
Multi-Agent Workflow
-
Retrieval Phase
-
Query sent to Hybrid Retriever
-
Top‑K documents fetched via Pinecone (semantic) and OpenSearch (keyword)
-
Combined score:
final_score = α × semantic_score + (1−α) × keyword_score
-
-
Agent 1: Predictor
-
Generates a candidate answer per document -
Input: Query + single document -
Output: Document-specific answer snippet
-
-
Agent 2: Judge
-
Scores candidate answers by relevance -
Score = log P(Yes) − log P(No) -
Computes adaptive threshold τq = μ − n·σ -
Filters documents with score ≥ τq
-
-
Agent 3: Final-Predictor
-
Integrates filtered documents into a cohesive answer -
Inserts [X]
citations for each fact
-
-
Agent 4: Claim Judge
-
Splits answer into individual claims -
Maps claims to question components -
Detects missing parts and formulates follow-up questions -
Retrieves new documents if needed and updates answer
-
Hybrid Retrieval Strategy
-
Semantic Retrieval: Pinecone + intfloat/e5-base-v2
embeddings -
Keyword Retrieval: AWS OpenSearch with precise matching -
Fusion: Weighted by α parameter
Adaptive Threshold Mechanism
-
Calculate mean μ and standard deviation σ of relevance scores -
Threshold τq = μ − n × σ -
Retain documents with scores ≥ τq
Follow‑Up & Completeness Check
-
Question Analysis: Detects multi-part queries -
Claim Mapping: Links each claim to query parts -
Coverage Assessment: Labels claims as “fully,” “partially,” or “not answered” -
Follow-Up Generation: Creates and answers sub-questions for any gaps
Configuration & Parameters
Config Key | Description |
---|---|
AWS_PROFILE |
AWS credential profile name |
AWS_REGION |
AWS region |
HUGGING_FACE_HUB_TOKEN |
Token for Hugging Face model access |
--n |
Adaptive threshold multiplier |
--alpha |
Hybrid retrieval weight |
--top_k |
Number of documents to retrieve |
Usage Examples
# Set environment variables
export AWS_PROFILE=sigir-participant
export AWS_REGION=us-east-1
# Run single question
python run_RAGentA.py \
--model tiiuae/Falcon3-10B-Instruct \
--n 0.5 \
--alpha 0.7 \
--top_k 15 \
--single_question "What is retrieval-augmented generation?"
Sample JSONL output:
{
"id": "q123",
"question": "What is retrieval-augmented generation?",
"passages": [
{ "passage": "Document content…", "doc_IDs": ["doc1","doc5"] }
],
"final_prompt": "The prompt used for generation…",
"answer": "Generated answer with citations…"
}
FAQ
Why combine semantic and keyword retrieval?
Blending both ensures deep contextual matching while preserving precise keyword coverage, reducing both missed and irrelevant documents.
How should I adjust the α
parameter?
If semantic context is paramount (long-form texts, rich semantics), increase α above 0.65; if exact keyword hits matter more, lower α.
What does the adaptive threshold do?
It dynamically sets a relevance cutoff based on the score distribution (mean and standard deviation), guarding against over- or under-filtering.
How do I verify citations in the answer?
Each `[X]` tag corresponds to the Xth document in the retrieved list, enabling quick manual source checks.
How-To: Integrate RAGentA
-
Prepare Environment: Python 3.8+, PyTorch, AWS & Pinecone accounts
-
Clone & Install:
git clone git@github.com:tobiasschreieder/LiveRAG.git cd LiveRAG
env/bin/activate
pip install -r requirements.txt
3. **Configure**: Set α, top_k, and AWS profile in `config.yaml` or via environment variables
4. **Sample Code**:
```python
from RAGentA import RAGentA
agent = RAGentA(
model="tiiuae/Falcon3-10B-Instruct",
n=0.5,
alpha=0.65,
top_k=20
)
print(agent.answer("What is a knowledge graph?"))
-
Evaluate:
from RAG_evaluation import evaluate_corpus_rag_mrr, evaluate_corpus_rag_recall mrr = evaluate_corpus_rag_mrr(retrieved, golden, k=5) recall = evaluate_corpus_rag_recall(retrieved, golden, k=20) print(f"MRR: {mrr:.4f}, Recall: {recall:.4f}")
Evaluation Metrics
-
MRR (Mean Reciprocal Rank): Average inverse rank of the first relevant document -
Recall: Proportion of relevant documents retrieved -
Precision: Proportion of retrieved documents that are relevant -
F1 Score: Harmonic mean of Precision and Recall
License & Acknowledgments
This project is released under the BSD 2-Clause License. See the LICENSE
file for details.
Inspired by:
-
Chang et al., “MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation,” arXiv:2501.00332 -
Besrour et al., “RAGentA: Multi-Agent Retrieval-Augmented Generation for Attributed Question Answering,” arXiv:2506.16988 (SIGIR 2025)
@misc{Chang2024MAIN-RAG,
title={MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation},
author={Chia-Yuan Chang and Zhimeng Jiang and Vineeth Rakesh and Menghai Pan and Chin-Chia Michael Yeh and Guanchu Wang and Mingzhi Hu and Zhichao Xu and Yan Zheng and Mahashweta Das and Na Zou},
year={2024},
eprint={2501.00332},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.00332}
}
@misc{Besrour2025RAGentA,
author={Ines Besrour and Jingbo He and Tobias Schreieder and Michael F{"a}rber},
title={{RAGentA: Multi-Agent Retrieval-Augmented Generation for Attributed Question Answering}},
year={2025},
eprint={2506.16988},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2506.16988}
}