RAGentA: Revolutionizing Retrieval-Augmented Generation with Multi-Agent Precision

RAGentA: A Multi-Agent Retrieval-Augmented Generation Framework

In an age when information overload can overwhelm users and systems alike, delivering accurate, comprehensive, and traceable answers is a critical challenge. RAGentA (Retrieval-Augmented Generation Agent) rises to this challenge with a unique multi-agent design, hybrid retrieval methods, and rigorous citation tracking, ensuring that each answer is both relevant and grounded in real sources.

Introduction
Key Features
Prerequisites and Installation
- Environment Setup
- Repository Clone & Dependencies
- AWS Credentials & Environment Variables
Quick Start
- Single-Question Mode
- Batch-Processing Mode
System Architecture
1. Multi-Agent Workflow
  - Agent 1: Predictor
  - Agent 2: Judge
  - Agent 3: Final-Predictor
  - Agent 4: Claim Judge
2. Hybrid Retrieval Strategy
3. Adaptive Threshold Mechanism
4. Follow‑Up & Completeness Check
Configuration & Parameters
Usage Examples
FAQ
How-To: Integrate RAGentA
Evaluation Metrics
License & Acknowledgments

Introduction

RAGentA is designed for trustworthy, attributable question answering. By combining multiple specialized agents, semantic and keyword retrieval methods, and built-in citation tracking, RAGentA ensures:

Comprehensive coverage: Answers address every component of a query
High relevance: Responses are strictly grounded in retrieved documents
Full traceability: Citations link back to each source used

Whether you need factual definitions, technical explanations, or nuanced insights, RAGentA’s architecture delivers clear, reliable answers.

Key Features

Multi-Agent Architecture: Separate agents handle retrieval, relevance scoring, generation, and claim analysis.
Hybrid Retrieval: Seamlessly blends dense (semantic) search with sparse (keyword) search.
Citation Tracking: Automatically labels every fact with [X] citations.
Claim-Level Analysis: Breaks answers into individual claims to detect gaps.
Follow-Up Processing: Generates and answers follow-up questions for any missing parts.
Standard Evaluation: Built-in support for MRR, Recall, Precision, and F1 metrics.

Prerequisites and Installation

Environment Setup

OS: Linux, macOS, or Windows
Python: 3.8 or higher
Deep Learning: PyTorch ≥ 2.0.0
GPU: CUDA-compatible (recommended)
Cloud Services: AWS OpenSearch and Pinecone accounts

Repository Clone & Dependencies

git clone git@github.com:tobiasschreieder/LiveRAG.git
cd LiveRAG
python -m venv env
source env/bin/activate    # On Windows: env\Scripts\activate
pip install -r requirements.txt

AWS Credentials & Environment Variables

Create the credentials directory:
```
mkdir -p ~/.aws
```

~/.aws/credentials:

[sigir-participant]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

~/.aws/config:

[profile sigir-participant]
region = us-east-1
output = json

Add environment variables:

export AWS_PROFILE=sigir-participant
export AWS_REGION=us-east-1
export HUGGING_FACE_HUB_TOKEN=your_hf_token

Quick Start

Single-Question Mode

python run_RAGentA.py \
  --model tiiuae/Falcon3-10B-Instruct \
  --n 0.5 \
  --alpha 0.65 \
  --top_k 20 \
  --single_question "What is a knowledge graph?"

Batch-Processing Mode

python run_RAGentA.py \
  --model tiiuae/Falcon3-10B-Instruct \
  --n 0.5 \
  --alpha 0.65 \
  --top_k 20 \
  --data_file questions.jsonl \
  --output_format jsonl

Parameter reference:

Parameter	Description	Default
`--model`	Model name or path	`tiiuae/falcon-3-10b-instruct`
`--n`	Adaptive threshold factor	`0.5`
`--alpha`	Weight for semantic vs. keyword scoring	`0.65`
`--top_k`	Number of documents to retrieve	`20`
`--data_file`	Input JSON/JSONL file of questions	—
`--single_question`	Single question text	—
`--output_format`	Output format: json, jsonl, or debug	`jsonl`
`--output_dir`	Directory for saving results	`results`

System Architecture

Multi-Agent Workflow

Retrieval Phase
- Query sent to Hybrid Retriever
- Top‑K documents fetched via Pinecone (semantic) and OpenSearch (keyword)
- Combined score:
```
final_score = α × semantic_score + (1−α) × keyword_score
```
Agent 1: Predictor
- Generates a candidate answer per document
- Input: Query + single document
- Output: Document-specific answer snippet
Agent 2: Judge
- Scores candidate answers by relevance
- Score = log P(Yes) − log P(No)
- Computes adaptive threshold τq = μ − n·σ
- Filters documents with score ≥ τq
Agent 3: Final-Predictor
- Integrates filtered documents into a cohesive answer
- Inserts [X] citations for each fact
Agent 4: Claim Judge
- Splits answer into individual claims
- Maps claims to question components
- Detects missing parts and formulates follow-up questions
- Retrieves new documents if needed and updates answer

Hybrid Retrieval Strategy

Semantic Retrieval: Pinecone + intfloat/e5-base-v2 embeddings
Keyword Retrieval: AWS OpenSearch with precise matching
Fusion: Weighted by α parameter

Adaptive Threshold Mechanism

Calculate mean μ and standard deviation σ of relevance scores
Threshold τq = μ − n × σ
Retain documents with scores ≥ τq

Follow‑Up & Completeness Check

Question Analysis: Detects multi-part queries
Claim Mapping: Links each claim to query parts
Coverage Assessment: Labels claims as “fully,” “partially,” or “not answered”
Follow-Up Generation: Creates and answers sub-questions for any gaps

Configuration & Parameters

Config Key	Description
`AWS_PROFILE`	AWS credential profile name
`AWS_REGION`	AWS region
`HUGGING_FACE_HUB_TOKEN`	Token for Hugging Face model access
`--n`	Adaptive threshold multiplier
`--alpha`	Hybrid retrieval weight
`--top_k`	Number of documents to retrieve

Usage Examples

# Set environment variables
export AWS_PROFILE=sigir-participant
export AWS_REGION=us-east-1

# Run single question
python run_RAGentA.py \
  --model tiiuae/Falcon3-10B-Instruct \
  --n 0.5 \
  --alpha 0.7 \
  --top_k 15 \
  --single_question "What is retrieval-augmented generation?"

Sample JSONL output:

{
  "id": "q123",
  "question": "What is retrieval-augmented generation?",
  "passages": [
    { "passage": "Document content…", "doc_IDs": ["doc1","doc5"] }
  ],
  "final_prompt": "The prompt used for generation…",
  "answer": "Generated answer with citations…"
}

FAQ

Why combine semantic and keyword retrieval?

Blending both ensures deep contextual matching while preserving precise keyword coverage, reducing both missed and irrelevant documents.

How should I adjust the α parameter?

If semantic context is paramount (long-form texts, rich semantics), increase α above 0.65; if exact keyword hits matter more, lower α.

What does the adaptive threshold do?

It dynamically sets a relevance cutoff based on the score distribution (mean and standard deviation), guarding against over- or under-filtering.

How do I verify citations in the answer?

Each `[X]` tag corresponds to the Xth document in the retrieved list, enabling quick manual source checks.

How-To: Integrate RAGentA

Prepare Environment: Python 3.8+, PyTorch, AWS & Pinecone accounts

Clone & Install:

git clone git@github.com:tobiasschreieder/LiveRAG.git
cd LiveRAG

env/bin/activate
pip install -r requirements.txt

3. **Configure**: Set α, top_k, and AWS profile in `config.yaml` or via environment variables
4. **Sample Code**:
```python
from RAGentA import RAGentA

agent = RAGentA(
  model="tiiuae/Falcon3-10B-Instruct",
  n=0.5,
  alpha=0.65,
  top_k=20
)
print(agent.answer("What is a knowledge graph?"))

Evaluate:

from RAG_evaluation import evaluate_corpus_rag_mrr, evaluate_corpus_rag_recall
mrr = evaluate_corpus_rag_mrr(retrieved, golden, k=5)
recall = evaluate_corpus_rag_recall(retrieved, golden, k=20)
print(f"MRR: {mrr:.4f}, Recall: {recall:.4f}")

Evaluation Metrics

MRR (Mean Reciprocal Rank): Average inverse rank of the first relevant document
Recall: Proportion of relevant documents retrieved
Precision: Proportion of retrieved documents that are relevant
F1 Score: Harmonic mean of Precision and Recall

License & Acknowledgments

This project is released under the BSD 2-Clause License. See the LICENSE file for details.

Inspired by:

Chang et al., “MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation,” arXiv:2501.00332
Besrour et al., “RAGentA: Multi-Agent Retrieval-Augmented Generation for Attributed Question Answering,” arXiv:2506.16988 (SIGIR 2025)

@misc{Chang2024MAIN-RAG,
  title={MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation},
  author={Chia-Yuan Chang and Zhimeng Jiang and Vineeth Rakesh and Menghai Pan and Chin-Chia Michael Yeh and Guanchu Wang and Mingzhi Hu and Zhichao Xu and Yan Zheng and Mahashweta Das and Na Zou},
  year={2024},
  eprint={2501.00332},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2501.00332}
}

@misc{Besrour2025RAGentA,
  author={Ines Besrour and Jingbo He and Tobias Schreieder and Michael F{"a}rber},
  title={{RAGentA: Multi-Agent Retrieval-Augmented Generation for Attributed Question Answering}},
  year={2025},
  eprint={2506.16988},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2506.16988}
}