Ragbits: The Modular Toolkit for Accelerating GenAI Application Development

What is Ragbits?

Ragbits is a modular toolkit specifically designed to accelerate generative AI application development. It provides core components for building reliable, scalable AI applications, enabling developers to quickly implement:

  • Seamless integration with 100+ large language models
  • Document retrieval augmented generation (RAG) systems
  • Chatbot interfaces with user interfaces
  • Distributed document processing
  • Production-ready AI deployments

Developed by the DeepSeek team and released under the MIT open-source license, this toolkit is particularly suitable for AI projects requiring rapid prototyping and production deployment.


Core Capabilities Explained

🔨 Building Reliable & Scalable GenAI Applications

1. Flexible LLM Integration

  • Supports 100+ major language models via LiteLLM
  • Local model deployment capabilities
  • Type-safe LLM calling interface (Python generics support)
# Type-safe LLM call example
class QuestionAnswerPromptOutput(BaseModel):
    answer: str

llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True)
# Output automatically validates against QuestionAnswerPromptOutput structure

**2. Vector Database Flexibility**

- Built-in support for Qdrant, PgVector and other popular vector stores
- In-memory vector store for rapid prototyping
- Custom storage adapter interfaces

**3. Developer-Friendly Tools**

- CLI for vector store management
- Terminal-based prompt testing
- Modular installation to reduce dependencies

```bash
ragbits vector-store manage  # Manage vector stores via CLI
```

### 📚 Document Processing & Retrieval Augmentation

**1. Multi-Format Document Support**

- Processes 20+ formats including PDF, HTML, spreadsheets, presentations
- Optional Docling or Unstructured parsing engines
- Supports extraction of complex content like tables and images

**2. Distributed Document Processing**

- Parallel processing via Ray framework
- Fast ingestion of large datasets
- Native cloud storage support (S3/GCS/Azure)

```python
# Distributed document processing example
await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
```

**3. Retrieval-Augmented Generation (RAG)**

- End-to-end RAG pipelines
- Context-aware Q&A systems
- Automatic injection of retrieval results into prompts

### 🚀 Deployment & Monitoring Solutions

**1. Full Observability**

- OpenTelemetry integration
- Real-time CLI performance monitoring
- User feedback collection mechanisms

**2. Production-Ready Features**

- Prompt testing with promptfoo
- Automatic model performance optimization
- Chat UI with persistent storage

```python
# Chatbot deployment example
RagbitsAPI(MyChat).run()  # One-click service launch
```

**3. Modular Extensible Architecture**

- Component-based installation
- Custom parser development interfaces
- Adapter pattern for external system integration

---

## Five-Minute Quickstart

### Installation Guide

Basic installation (recommended for beginners):

```bash
pip install ragbits
```

Includes six core modules:

1. `ragbits-core` - Foundational LLM/vector store operations
2. `ragbits-agents` - Agent system construction
3. `ragbits-document-search` - Document retrieval pipelines
4. `ragbits-evaluate` - Evaluation framework
5. `ragbits-chat` - Chat application infrastructure
6. `ragbits-cli` - Command-line tools

Advanced users can install components selectively:

```bash
pip install ragbits-core ragbits-document-search
```

### Basic LLM Interaction

```python
import asyncio
from pydantic import BaseModel
from ragbits.core.llms import LiteLLM
from ragbits.core.prompt import Prompt

# Define data structures
class QAInput(BaseModel):
    question: str

class QAOutput(BaseModel):
    answer: str

# Create prompt template
class QAPrompt(Prompt[QAInput, QAOutput]):
    system_prompt = "You're a helpful assistant. Answer questions accurately."
    user_prompt = "Question: {{ question }}"

# Execute LLM call
async def main():
    llm = LiteLLM(model_name="gpt-4.1-nano")
    prompt = QAPrompt(QAInput(question="What are high memory and low memory in Linux?"))
    response = await llm.generate(prompt)
    print(response.answer)

asyncio.run(main())
```

### Document Retrieval Practice

```python
from ragbits.core.embeddings import LiteLLMEmbedder
from ragbits.core.vector_stores import InMemoryVectorStore
from ragbits.document_search import DocumentSearch

# Initialize retrieval system
embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
vector_store = InMemoryVectorStore(embedder=embedder)
document_search = DocumentSearch(vector_store=vector_store)

# Ingest and query academic paper
async def run():
    await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
    result = await document_search.search("What are the core findings of this paper?")
    print(result[0].text_representation[:500])  # Print first 500 characters

asyncio.run(run())
```

### Complete RAG Pipeline

```python
class RAGInput(BaseModel):
    question: str
    context: list[str]

class RAGPrompt(Prompt[RAGInput, str]):
    system_prompt = "Answer based on context. Say if you don't know."
    user_prompt = "Question: {{ question }}\nContext: {% for item in context %}{{ item }}{% endfor %}"

# Combine retrieval + generation
async def rag_pipeline(question: str):
    # Document retrieval
    search_results = await document_search.search(question)

    # Build prompt
    rag_input = RAGInput(
        question=question,
        context=[r.text_representation for r in search_results]
    )
    prompt = RAGPrompt(rag_input)

    # Generate answer
    return await llm.generate(prompt)
```

### Chatbot Development

```python
from ragbits.chat.api import RagbitsAPI
from ragbits.chat.interface import ChatInterface

class MyChatBot(ChatInterface):
    async def setup(self):
        # Initialize retrieval system
        self.embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
        self.vector_store = InMemoryVectorStore(embedder=self.embedder)
        self.document_search = DocumentSearch(vector_store=self.vector_store)
        await self.document_search.ingest("web://https://example.com/knowledge.pdf")

    async def chat(self, message: str, history=None, context=None):
        # Retrieve relevant documents
        results = await self.document_search.search(message)
        context_texts = [r.text_representation for r in results]

        # Stream responses
        async for chunk in self.llm.generate_streaming(
            RAGPrompt(RAGInput(question=message, context=context_texts))
        ):
            yield self.create_text_response(chunk)

# Launch service
RagbitsAPI(MyChatBot).run()
```

---

## Advanced Development Techniques

### Project Scaffolding Generation

```bash
uvx create-ragbits-app
```

Generates project structure:

```
/my-rag-app
  ├── data_sources/    # Document storage
  ├── pipelines/       # Processing pipelines
  ├── prompts/         # Prompt templates
  ├── evaluators/      # Evaluation modules
  └── app.py           # Main application
```

### Performance Optimization Tips

1. **Embedding Model Selection**

   - Small applications: `text-embedding-3-small`
   - High-precision needs: `text-embedding-3-large`

2. **Distributed Document Processing**

```python
# Enable Ray parallel processing
from ragbits.document_search.distributed import RayDocumentProcessor
processor = RayDocumentProcessor(num_workers=8)
await processor.process_batch(documents)
```

3. **Hybrid Retrieval Strategy**

```python
# Combine keyword + vector search
from ragbits.document_search.retrievers import HybridRetriever
retriever = HybridRetriever(vector_store, keyword_store)
```

---

## Best Practices Guide

### Prompt Engineering Techniques

1. **Structured Output Control**

```python
class StructuredOutput(BaseModel):
    key_points: list[str]
    summary: str
    confidence: float

# Force structured LLM output
llm = LiteLLM(use_structured_output=True)
```

2. **Dynamic Context Injection**

```python
user_prompt = """
{% if user.role == 'admin' %}
  Admin command: {{ admin_query }}
{% else %}
  User question: {{ question }}
{% endif %}
"""
```

### Production Deployment

1. **Monitoring Configuration**

```yaml
# opentelemetry.yaml
exporters:
  console:
    verbosity: basic
  jaeger:
    endpoint: "localhost:14250"
```

2. **Chat Interface Customization**

```python
RagbitsAPI(
    MyChatBot,
    ui_config={
        "title""Enterprise Knowledge Assistant",
        "primary_color""#2563eb"
    }
).run(port=8080)
```

---

## Frequently Asked Questions

### Technical Implementation

**Q: Which local LLMs does Ragbits support?**  
A: Compatible with all HuggingFace format models via LiteLLM, supports GGUF/GGML quantization formats

**Q: How to process table data in scanned PDFs?**  
A: Integrates Unstructured.io's OCR engine for automatic table/image extraction

**Q: How is vector retrieval accuracy ensured?**  
A: Multi-level optimization:

1. Chunking strategy adjustment (sliding window/semantic chunking)
2. Re-ranking support
3. Hybrid retrieval modes

### Use Cases

**Q: What applications can be built?**  
A: Typical scenarios:

- Enterprise knowledge base Q&A systems
- Research paper analysis tools
- Customer support chatbots
- Intelligent internal document retrieval

**Q: How does it differ from traditional LangChain?**  
A: Key differentiators:

1. Strong type system
2. Native distributed processing
3. Built-in production monitoring
4. Modular dependency management

### Performance & Scaling

**Q: What knowledge base size per machine?**  
A: Benchmark data:
| Storage Engine | Documents | Query Latency |
|----------------|-----------|--------------|
| In-Memory | 10K | <100ms |
| Qdrant | 1M+ | 200-500ms |
| PGVector | 500K | 300-700ms |

**Q: How to scale processing capacity?**  
A: Horizontal scaling:

1. Document processing: Add Ray worker nodes
2. Retrieval layer: Vector database sharding
3. Generation layer: LLM load balancing

---

## Resources & Support

### Learning Materials

- [Official Documentation](https://ragbits.deepsense.ai)
- [API Reference](https://ragbits.deepsense.ai/api_reference/core/prompt/)
- [Example Projects](https://github.com/deepsense-ai/ragbits-examples)

### Community Support

- [GitHub Discussions](https://github.com/deepsense-ai/ragbits/discussions)
- [Issue Tracker](https://github.com/deepsense-ai/ragbits/issues)
- [Contribution Guide](https://github.com/deepsense-ai/ragbits/tree/main/CONTRIBUTING.md)