RAGentA: A Multi-Agent Retrieval-Augmented Generation Framework

In an age when information overload can overwhelm users and systems alike, delivering accurate, comprehensive, and traceable answers is a critical challenge. RAGentA (Retrieval-Augmented Generation Agent) rises to this challenge with a unique multi-agent design, hybrid retrieval methods, and rigorous citation tracking, ensuring that each answer is both relevant and grounded in real sources.


Table of Contents

  1. Introduction

  2. Key Features

  3. Prerequisites and Installation

    • Environment Setup
    • Repository Clone & Dependencies
    • AWS Credentials & Environment Variables
  4. Quick Start

    • Single-Question Mode
    • Batch-Processing Mode
  5. System Architecture

    1. Multi-Agent Workflow

      • Agent 1: Predictor
      • Agent 2: Judge
      • Agent 3: Final-Predictor
      • Agent 4: Claim Judge
    2. Hybrid Retrieval Strategy

    3. Adaptive Threshold Mechanism

    4. Follow‑Up & Completeness Check

  6. Configuration & Parameters

  7. Usage Examples

  8. FAQ

  9. How-To: Integrate RAGentA

  10. Evaluation Metrics

  11. License & Acknowledgments


Introduction

RAGentA is designed for trustworthy, attributable question answering. By combining multiple specialized agents, semantic and keyword retrieval methods, and built-in citation tracking, RAGentA ensures:

  • Comprehensive coverage: Answers address every component of a query
  • High relevance: Responses are strictly grounded in retrieved documents
  • Full traceability: Citations link back to each source used

Whether you need factual definitions, technical explanations, or nuanced insights, RAGentA’s architecture delivers clear, reliable answers.


Key Features

  • Multi-Agent Architecture: Separate agents handle retrieval, relevance scoring, generation, and claim analysis.
  • Hybrid Retrieval: Seamlessly blends dense (semantic) search with sparse (keyword) search.
  • Citation Tracking: Automatically labels every fact with [X] citations.
  • Claim-Level Analysis: Breaks answers into individual claims to detect gaps.
  • Follow-Up Processing: Generates and answers follow-up questions for any missing parts.
  • Standard Evaluation: Built-in support for MRR, Recall, Precision, and F1 metrics.

Prerequisites and Installation

Environment Setup

  • OS: Linux, macOS, or Windows
  • Python: 3.8 or higher
  • Deep Learning: PyTorch ≥ 2.0.0
  • GPU: CUDA-compatible (recommended)
  • Cloud Services: AWS OpenSearch and Pinecone accounts

Repository Clone & Dependencies

git clone git@github.com:tobiasschreieder/LiveRAG.git
cd LiveRAG
python -m venv env
source env/bin/activate    # On Windows: env\Scripts\activate
pip install -r requirements.txt

AWS Credentials & Environment Variables

  1. Create the credentials directory:

    mkdir -p ~/.aws
    
  2. ~/.aws/credentials:

    [sigir-participant]
    aws_access_key_id = YOUR_ACCESS_KEY
    aws_secret_access_key = YOUR_SECRET_KEY
    
  3. ~/.aws/config:

    [profile sigir-participant]
    region = us-east-1
    output = json
    
  4. Add environment variables:

    export AWS_PROFILE=sigir-participant
    export AWS_REGION=us-east-1
    export HUGGING_FACE_HUB_TOKEN=your_hf_token
    

Quick Start

Single-Question Mode

python run_RAGentA.py \
  --model tiiuae/Falcon3-10B-Instruct \
  --n 0.5 \
  --alpha 0.65 \
  --top_k 20 \
  --single_question "What is a knowledge graph?"

Batch-Processing Mode

python run_RAGentA.py \
  --model tiiuae/Falcon3-10B-Instruct \
  --n 0.5 \
  --alpha 0.65 \
  --top_k 20 \
  --data_file questions.jsonl \
  --output_format jsonl

Parameter reference:

Parameter Description Default
--model Model name or path tiiuae/falcon-3-10b-instruct
--n Adaptive threshold factor 0.5
--alpha Weight for semantic vs. keyword scoring 0.65
--top_k Number of documents to retrieve 20
--data_file Input JSON/JSONL file of questions
--single_question Single question text
--output_format Output format: json, jsonl, or debug jsonl
--output_dir Directory for saving results results

System Architecture

Multi-Agent Workflow

  1. Retrieval Phase

    • Query sent to Hybrid Retriever

    • Top‑K documents fetched via Pinecone (semantic) and OpenSearch (keyword)

    • Combined score:

      final_score = α × semantic_score + (1−α) × keyword_score
      
  2. Agent 1: Predictor

    • Generates a candidate answer per document
    • Input: Query + single document
    • Output: Document-specific answer snippet
  3. Agent 2: Judge

    • Scores candidate answers by relevance
    • Score = log P(Yes) − log P(No)
    • Computes adaptive threshold τq = μ − n·σ
    • Filters documents with score ≥ τq
  4. Agent 3: Final-Predictor

    • Integrates filtered documents into a cohesive answer
    • Inserts [X] citations for each fact
  5. Agent 4: Claim Judge

    • Splits answer into individual claims
    • Maps claims to question components
    • Detects missing parts and formulates follow-up questions
    • Retrieves new documents if needed and updates answer

Hybrid Retrieval Strategy

  • Semantic Retrieval: Pinecone + intfloat/e5-base-v2 embeddings
  • Keyword Retrieval: AWS OpenSearch with precise matching
  • Fusion: Weighted by α parameter

Adaptive Threshold Mechanism

  1. Calculate mean μ and standard deviation σ of relevance scores
  2. Threshold τq = μ − n × σ
  3. Retain documents with scores ≥ τq

Follow‑Up & Completeness Check

  • Question Analysis: Detects multi-part queries
  • Claim Mapping: Links each claim to query parts
  • Coverage Assessment: Labels claims as “fully,” “partially,” or “not answered”
  • Follow-Up Generation: Creates and answers sub-questions for any gaps

Configuration & Parameters

Config Key Description
AWS_PROFILE AWS credential profile name
AWS_REGION AWS region
HUGGING_FACE_HUB_TOKEN Token for Hugging Face model access
--n Adaptive threshold multiplier
--alpha Hybrid retrieval weight
--top_k Number of documents to retrieve

Usage Examples

# Set environment variables
export AWS_PROFILE=sigir-participant
export AWS_REGION=us-east-1

# Run single question
python run_RAGentA.py \
  --model tiiuae/Falcon3-10B-Instruct \
  --n 0.5 \
  --alpha 0.7 \
  --top_k 15 \
  --single_question "What is retrieval-augmented generation?"

Sample JSONL output:

{
  "id": "q123",
  "question": "What is retrieval-augmented generation?",
  "passages": [
    { "passage": "Document content…", "doc_IDs": ["doc1","doc5"] }
  ],
  "final_prompt": "The prompt used for generation…",
  "answer": "Generated answer with citations…"
}

FAQ

Why combine semantic and keyword retrieval?

Blending both ensures deep contextual matching while preserving precise keyword coverage, reducing both missed and irrelevant documents.

How should I adjust the α parameter?

If semantic context is paramount (long-form texts, rich semantics), increase α above 0.65; if exact keyword hits matter more, lower α.

What does the adaptive threshold do?

It dynamically sets a relevance cutoff based on the score distribution (mean and standard deviation), guarding against over- or under-filtering.

How do I verify citations in the answer?

Each `[X]` tag corresponds to the Xth document in the retrieved list, enabling quick manual source checks.


How-To: Integrate RAGentA

  1. Prepare Environment: Python 3.8+, PyTorch, AWS & Pinecone accounts

  2. Clone & Install:

    git clone git@github.com:tobiasschreieder/LiveRAG.git
    cd LiveRAG
    

env/bin/activate
pip install -r requirements.txt

3. **Configure**: Set α, top_k, and AWS profile in `config.yaml` or via environment variables
4. **Sample Code**:
```python
from RAGentA import RAGentA

agent = RAGentA(
  model="tiiuae/Falcon3-10B-Instruct",
  n=0.5,
  alpha=0.65,
  top_k=20
)
print(agent.answer("What is a knowledge graph?"))
  1. Evaluate:

    from RAG_evaluation import evaluate_corpus_rag_mrr, evaluate_corpus_rag_recall
    mrr = evaluate_corpus_rag_mrr(retrieved, golden, k=5)
    recall = evaluate_corpus_rag_recall(retrieved, golden, k=20)
    print(f"MRR: {mrr:.4f}, Recall: {recall:.4f}")
    

Evaluation Metrics

  • MRR (Mean Reciprocal Rank): Average inverse rank of the first relevant document
  • Recall: Proportion of relevant documents retrieved
  • Precision: Proportion of retrieved documents that are relevant
  • F1 Score: Harmonic mean of Precision and Recall

License & Acknowledgments

This project is released under the BSD 2-Clause License. See the LICENSE file for details.

Inspired by:

  • Chang et al., “MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation,” arXiv:2501.00332
  • Besrour et al., “RAGentA: Multi-Agent Retrieval-Augmented Generation for Attributed Question Answering,” arXiv:2506.16988 (SIGIR 2025)
@misc{Chang2024MAIN-RAG,
  title={MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation},
  author={Chia-Yuan Chang and Zhimeng Jiang and Vineeth Rakesh and Menghai Pan and Chin-Chia Michael Yeh and Guanchu Wang and Mingzhi Hu and Zhichao Xu and Yan Zheng and Mahashweta Das and Na Zou},
  year={2024},
  eprint={2501.00332},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2501.00332}
}

@misc{Besrour2025RAGentA,
  author={Ines Besrour and Jingbo He and Tobias Schreieder and Michael F{"a}rber},
  title={{RAGentA: Multi-Agent Retrieval-Augmented Generation for Attributed Question Answering}},
  year={2025},
  eprint={2506.16988},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2506.16988}
}