Ragbits: The Modular Toolkit for Accelerating GenAI Application Development
What is Ragbits?
Ragbits is a modular toolkit specifically designed to accelerate generative AI application development. It provides core components for building reliable, scalable AI applications, enabling developers to quickly implement:
-
Seamless integration with 100+ large language models -
Document retrieval augmented generation (RAG) systems -
Chatbot interfaces with user interfaces -
Distributed document processing -
Production-ready AI deployments
Developed by the DeepSeek team and released under the MIT open-source license, this toolkit is particularly suitable for AI projects requiring rapid prototyping and production deployment.
Core Capabilities Explained
🔨 Building Reliable & Scalable GenAI Applications
1. Flexible LLM Integration
-
Supports 100+ major language models via LiteLLM -
Local model deployment capabilities -
Type-safe LLM calling interface (Python generics support)
# Type-safe LLM call example
class QuestionAnswerPromptOutput(BaseModel):
answer: str
llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True)
# Output automatically validates against QuestionAnswerPromptOutput structure
**2. Vector Database Flexibility**
- Built-in support for Qdrant, PgVector and other popular vector stores
- In-memory vector store for rapid prototyping
- Custom storage adapter interfaces
**3. Developer-Friendly Tools**
- CLI for vector store management
- Terminal-based prompt testing
- Modular installation to reduce dependencies
```bash
ragbits vector-store manage # Manage vector stores via CLI
```
### 📚 Document Processing & Retrieval Augmentation
**1. Multi-Format Document Support**
- Processes 20+ formats including PDF, HTML, spreadsheets, presentations
- Optional Docling or Unstructured parsing engines
- Supports extraction of complex content like tables and images
**2. Distributed Document Processing**
- Parallel processing via Ray framework
- Fast ingestion of large datasets
- Native cloud storage support (S3/GCS/Azure)
```python
# Distributed document processing example
await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
```
**3. Retrieval-Augmented Generation (RAG)**
- End-to-end RAG pipelines
- Context-aware Q&A systems
- Automatic injection of retrieval results into prompts
### 🚀 Deployment & Monitoring Solutions
**1. Full Observability**
- OpenTelemetry integration
- Real-time CLI performance monitoring
- User feedback collection mechanisms
**2. Production-Ready Features**
- Prompt testing with promptfoo
- Automatic model performance optimization
- Chat UI with persistent storage
```python
# Chatbot deployment example
RagbitsAPI(MyChat).run() # One-click service launch
```
**3. Modular Extensible Architecture**
- Component-based installation
- Custom parser development interfaces
- Adapter pattern for external system integration
---
## Five-Minute Quickstart
### Installation Guide
Basic installation (recommended for beginners):
```bash
pip install ragbits
```
Includes six core modules:
1. `ragbits-core` - Foundational LLM/vector store operations
2. `ragbits-agents` - Agent system construction
3. `ragbits-document-search` - Document retrieval pipelines
4. `ragbits-evaluate` - Evaluation framework
5. `ragbits-chat` - Chat application infrastructure
6. `ragbits-cli` - Command-line tools
Advanced users can install components selectively:
```bash
pip install ragbits-core ragbits-document-search
```
### Basic LLM Interaction
```python
import asyncio
from pydantic import BaseModel
from ragbits.core.llms import LiteLLM
from ragbits.core.prompt import Prompt
# Define data structures
class QAInput(BaseModel):
question: str
class QAOutput(BaseModel):
answer: str
# Create prompt template
class QAPrompt(Prompt[QAInput, QAOutput]):
system_prompt = "You're a helpful assistant. Answer questions accurately."
user_prompt = "Question: {{ question }}"
# Execute LLM call
async def main():
llm = LiteLLM(model_name="gpt-4.1-nano")
prompt = QAPrompt(QAInput(question="What are high memory and low memory in Linux?"))
response = await llm.generate(prompt)
print(response.answer)
asyncio.run(main())
```
### Document Retrieval Practice
```python
from ragbits.core.embeddings import LiteLLMEmbedder
from ragbits.core.vector_stores import InMemoryVectorStore
from ragbits.document_search import DocumentSearch
# Initialize retrieval system
embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
vector_store = InMemoryVectorStore(embedder=embedder)
document_search = DocumentSearch(vector_store=vector_store)
# Ingest and query academic paper
async def run():
await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
result = await document_search.search("What are the core findings of this paper?")
print(result[0].text_representation[:500]) # Print first 500 characters
asyncio.run(run())
```
### Complete RAG Pipeline
```python
class RAGInput(BaseModel):
question: str
context: list[str]
class RAGPrompt(Prompt[RAGInput, str]):
system_prompt = "Answer based on context. Say if you don't know."
user_prompt = "Question: {{ question }}\nContext: {% for item in context %}{{ item }}{% endfor %}"
# Combine retrieval + generation
async def rag_pipeline(question: str):
# Document retrieval
search_results = await document_search.search(question)
# Build prompt
rag_input = RAGInput(
question=question,
context=[r.text_representation for r in search_results]
)
prompt = RAGPrompt(rag_input)
# Generate answer
return await llm.generate(prompt)
```
### Chatbot Development
```python
from ragbits.chat.api import RagbitsAPI
from ragbits.chat.interface import ChatInterface
class MyChatBot(ChatInterface):
async def setup(self):
# Initialize retrieval system
self.embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
self.vector_store = InMemoryVectorStore(embedder=self.embedder)
self.document_search = DocumentSearch(vector_store=self.vector_store)
await self.document_search.ingest("web://https://example.com/knowledge.pdf")
async def chat(self, message: str, history=None, context=None):
# Retrieve relevant documents
results = await self.document_search.search(message)
context_texts = [r.text_representation for r in results]
# Stream responses
async for chunk in self.llm.generate_streaming(
RAGPrompt(RAGInput(question=message, context=context_texts))
):
yield self.create_text_response(chunk)
# Launch service
RagbitsAPI(MyChatBot).run()
```
---
## Advanced Development Techniques
### Project Scaffolding Generation
```bash
uvx create-ragbits-app
```
Generates project structure:
```
/my-rag-app
├── data_sources/ # Document storage
├── pipelines/ # Processing pipelines
├── prompts/ # Prompt templates
├── evaluators/ # Evaluation modules
└── app.py # Main application
```
### Performance Optimization Tips
1. **Embedding Model Selection**
- Small applications: `text-embedding-3-small`
- High-precision needs: `text-embedding-3-large`
2. **Distributed Document Processing**
```python
# Enable Ray parallel processing
from ragbits.document_search.distributed import RayDocumentProcessor
processor = RayDocumentProcessor(num_workers=8)
await processor.process_batch(documents)
```
3. **Hybrid Retrieval Strategy**
```python
# Combine keyword + vector search
from ragbits.document_search.retrievers import HybridRetriever
retriever = HybridRetriever(vector_store, keyword_store)
```
---
## Best Practices Guide
### Prompt Engineering Techniques
1. **Structured Output Control**
```python
class StructuredOutput(BaseModel):
key_points: list[str]
summary: str
confidence: float
# Force structured LLM output
llm = LiteLLM(use_structured_output=True)
```
2. **Dynamic Context Injection**
```python
user_prompt = """
{% if user.role == 'admin' %}
Admin command: {{ admin_query }}
{% else %}
User question: {{ question }}
{% endif %}
"""
```
### Production Deployment
1. **Monitoring Configuration**
```yaml
# opentelemetry.yaml
exporters:
console:
verbosity: basic
jaeger:
endpoint: "localhost:14250"
```
2. **Chat Interface Customization**
```python
RagbitsAPI(
MyChatBot,
ui_config={
"title": "Enterprise Knowledge Assistant",
"primary_color": "#2563eb"
}
).run(port=8080)
```
---
## Frequently Asked Questions
### Technical Implementation
**Q: Which local LLMs does Ragbits support?**
A: Compatible with all HuggingFace format models via LiteLLM, supports GGUF/GGML quantization formats
**Q: How to process table data in scanned PDFs?**
A: Integrates Unstructured.io's OCR engine for automatic table/image extraction
**Q: How is vector retrieval accuracy ensured?**
A: Multi-level optimization:
1. Chunking strategy adjustment (sliding window/semantic chunking)
2. Re-ranking support
3. Hybrid retrieval modes
### Use Cases
**Q: What applications can be built?**
A: Typical scenarios:
- Enterprise knowledge base Q&A systems
- Research paper analysis tools
- Customer support chatbots
- Intelligent internal document retrieval
**Q: How does it differ from traditional LangChain?**
A: Key differentiators:
1. Strong type system
2. Native distributed processing
3. Built-in production monitoring
4. Modular dependency management
### Performance & Scaling
**Q: What knowledge base size per machine?**
A: Benchmark data:
| Storage Engine | Documents | Query Latency |
|----------------|-----------|--------------|
| In-Memory | 10K | <100ms |
| Qdrant | 1M+ | 200-500ms |
| PGVector | 500K | 300-700ms |
**Q: How to scale processing capacity?**
A: Horizontal scaling:
1. Document processing: Add Ray worker nodes
2. Retrieval layer: Vector database sharding
3. Generation layer: LLM load balancing
---
## Resources & Support
### Learning Materials
- [Official Documentation](https://ragbits.deepsense.ai)
- [API Reference](https://ragbits.deepsense.ai/api_reference/core/prompt/)
- [Example Projects](https://github.com/deepsense-ai/ragbits-examples)
### Community Support
- [GitHub Discussions](https://github.com/deepsense-ai/ragbits/discussions)
- [Issue Tracker](https://github.com/deepsense-ai/ragbits/issues)
- [Contribution Guide](https://github.com/deepsense-ai/ragbits/tree/main/CONTRIBUTING.md)