RAGLight: The 15-Minute, 35-MB Route to a Private, Hallucination-Free ChatGPT
Because your docs deserve better than copy-paste into someone else’s cloud.
1. Why Another RAG Framework?
Everyone loves Large Language Models—until they invent revenue figures, API limits, or non-existent GitHub repos.
Retrieval-Augmented Generation (RAG) fixes this by letting the model “open the book” before it answers. The trouble? Most libraries still feel like assembling IKEA furniture with three missing screws.
Enter RAGLight—a MIT-licensed, plug-and-play Python toolkit that shrinks the usual 200-line boilerplate into an 8-line script (or one CLI wizard). No SaaS, no telemetry, 35 MB on disk.
2. What Exactly Is RAGLight?
Out-of-the-box extras:
-
Auto-ignore .venv
,node_modules
,__pycache__
,.idea
… (14 folders by default, customizable) -
CLI command raglight chat
—zero code, full interactive setup -
Agentic RAG & RAT (Reasoning-Augmented Thinking) pipelines baked in -
MCP (Model-Context-Protocol) server support—plug calculators, SQL, or search engines into the prompt loop
3. Quick Start: From 0 to Chat in 5 Minutes
3.1 Prerequisites
-
Python ≥ 3.9 -
Any LLM endpoint (local Ollama demo below) -
Folder with PDFs, code, Markdown—literally anything text-based
3.2 Install & Validate
# 1. Get Ollama (Mac/Linux—Windows use the exe installer)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.2 # 3-B param model, ~4 GB VRAM
# 2. Get RAGLight
pip install -U raglight
raglight --help # pretty fire emoji menu → you're good
Common pitfall
Apple-Silicon + conda: conda install libomp
first or Chroma will Illegal instruction: 4
on you.
4. Option A: Zero-Code CLI Wizard
raglight chat
The wizard asks 5 questions:
-
Path to your folder -
Extra folders to ignore (already pre-filled with node_modules
,.venv
…) -
Vector-store name & location -
Embedding model (default: all-MiniLM-L6-v2
) -
LLM (default: llama3.2
)
Indexing 100 MB of mixed docs takes ~2 min on M2 Pro. After that you’re dropped into a REPL:
> How do I dockerize RAGLight?
Follow the multi-stage build in docs/Dockerfile.example...
Quit with /exit
. The index stays on disk—next startup <3 s.
5. Option B: 8-Line Python Script
Save as quick_start.py
:
from raglight import RAGPipeline, FolderSource
from raglight.config.settings import Settings
Settings.setup_logging(level=2) # 3 = debug
pipeline = RAGPipeline(
knowledge_base=[FolderSource("./my_docs")],
model_name="llama3.2",
provider="ollama",
k=5) # top-5 chunks
pipeline.build()
print(pipeline.generate("List three bullet-proof benefits of RAGLight vs LangChain"))
Run:
python quick_start.py
Sample output:
-
35-MB install, no PyTorch lock-in -
14 built-in ignore folders = cleaner index -
One-parameter switch Ollama ↔ OpenAI ↔ Mistral
6. Deep Dive: The Plumbing (Architecture Diagram)
┌────────────────────────────────────────────┐
│ Data Sources (Folder, GitHub, S3...) │
└─────────────┬──────────────────────────────┘
│ chunk & hash
┌─────────────▼──────────────────────────────┐
│ Embedding Provider │
│ (HuggingFace, Ollama, OpenAI, Google) │
└─────────────┬──────────────────────────────┘
│ vector
┌─────────────▼──────────────────────────────┐
│ Vector Store (Chroma) │
│ Collections: docs, docs_classes │
└─────────────┬──────────────────────────────┘
│ retrieve top-k
┌─────────────▼──────────────────────────────┐
│ Generator │
│ (Ollama, LMStudio, Mistral, vLLM, OAI) │
│ Optional: Agent, RAT, MCP tools │
└────────────────────────────────────────────┘
Everything is a replaceable block—no subclassing required.
7. Advanced Recipe 1: Agentic RAG (Self-Reflection Loop)
Use-case: Tech support bot that needs to look up several subsystems.
Code:
from raglight import AgenticRAGPipeline, AgenticRAGConfig, VectorStoreConfig
from raglight.config.settings import Settings
vs_conf = VectorStoreConfig(
embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
provider=Settings.HUGGINGFACE,
database=Settings.CHROMA,
persist_directory="./defaultDb")
config = AgenticRAGConfig(
provider=Settings.MISTRAL,
model="mistral-large-2411",
k=10,
max_steps=4, # max reflection loops
api_key="YOUR_MISTRAL_KEY",
system_prompt=Settings.DEFAULT_AGENT_PROMPT)
agent = AgenticRAGPipeline(config, vs_conf)
agent.build()
print(agent.generate("How to combine OpenAI embeddings with Ollama generator in RAGLight?"))
Benchmark (same question, n=30):
8. Advanced Recipe 2: RAT (Reasoning-Augmented Thinking)
RAT adds a “critic” step: a reasoning model (Deepseek-R1, o1) reviews the draft, spots missing facts, triggers re-retrieval, then rewrites.
from raglight import RATPipeline, RATConfig
config = RATConfig(
llm="llama3.2:3b", # generator
reasoning_llm="deepseek-r1:1.5b", # critic
reflection=3, # loops
provider=Settings.OLLAMA)
rat = RATPipeline(config)
rat.build()
print(rat.generate("Why is RAGLight lighter than LangChain? Give numbers."))
Output now includes a markdown table with dependency sizes, memory footprints, and cold-start times—hallucination drops to near-zero.
9. Advanced Recipe 3: Plug External Tools via MCP
MCP (Model-Context-Protocol) turns any REST-capable tool into a JSON-schema function the agent can call.
-
Start the example calculator server
git clone https://github.com/anthropics/mcp-server-examples
cd calculator && pip install mcp && python server.py
# listens on 127.0.0.1:8001/sse
-
Feed the URL to RAGLight:
config = AgenticRAGConfig(
...
mcp_config=[{"url": "http://127.0.0.1:8001/sse"}])
-
Ask away:
> If 2024 revenue is 3.45 B USD and grew 18 % YoY, what was 2023 revenue?
Agent calls calculator → 2.92 B USD, shows formula.
10. Dockerizing for Your Team
Dockerfile (14 lines):
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "-m", "raglight", "chat"]
Build & run:
docker build -t raglight-app .
docker run --add-host=host.docker.internal:host-gateway -it raglight-app
Tip: Ollama binds to 127.0.0.1 by default; --add-host
lets the container reach the host service. Or set OLLAMA_HOST=0.0.0.0
for LAN-wide access.
11. Performance & Cost Cheat-Sheet
12. RAGLight vs. LangChain vs. LlamaIndex
Data collected 29 Sep 2024, Python 3.11, M2 Pro, median of 3 runs.
13. Real-World Story: 3-Year Tech Blog → Slack Bot
Stack: 200 Markdown posts, 100 Jupyter notebooks, 50 PDF white-board drafts.
Steps:
-
Drop everything into ./tech_blog
-
raglight chat
, ignoreimages
,output
,node_modules
-
multilingual-MiniLM + llama3.2:3b
-
Wrap Docker container behind Slack Bolt app
Outcomes
-
Avg. response 4.3 s (incl. network) -
30 % fewer “hey what’s our stance on X?” pings to senior engineers -
87 % “thumbs-up” rate after 3 weeks
14. Roadmap & How to Contribute
-
v0.4 – Async FastAPI backend + WebSocket streaming -
v0.5 – Qdrant & Weaviate support, gRPC transport -
v0.6 – React-based UI, drag-and-drop folder upload
Good first issues: add batch_size
for Chroma, write Korean README, benchmark scripts.
Jump in: GitHub → Bessouat40/RAGLight
→ Discussions. Maintainers reply within 24 h.
15. Key Takeaways
-
RAGLight compresses the classic 200-line RAG scaffolding into an 8-line snippet. -
You keep all data local by default—no hidden API calls, no telemetry. -
Switching between Ollama, OpenAI, or Mistral is a one-parameter change. -
Agentic RAG and RAT are first-class citizens, not weekend side-projects. -
The entire toolchain fits in a 35-MB pip wheel—smaller than most logo PNGs.
16. Your Next 5 Minutes
pip install raglight
ollama pull llama3.2
raglight chat
When you’re back, tell me: What was the first surprise answer your new knowledge bot gave you?
References & Further Reading
-
Lewis et al. 2020, “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks” -
Wikipedia: Retrieval-augmented generation -
Anthropic MCP Intro: https://github.com/anthropics/mcp -
Ollama GitHub: https://github.com/ollama/ollama