What is LiteLLM? The Unified Gateway for Calling 100+ LLMs with One Consistent Interface
If you’re building AI applications in 2026, you’ve probably faced this situation more than once:
You start with OpenAI’s GPT models, then want to experiment with Anthropic’s Claude series, Groq’s ultra-fast inference, or perhaps a cost-effective option from DeepSeek or local Ollama deployment. Each provider has its own SDK, slightly different request/response shapes, authentication method, and error handling quirks.
Suddenly your codebase fills with conditional branches, custom adapters, and fragile switching logic. When the team grows and you need centralized cost tracking, key management, rate limiting, or automatic failover, things get even messier.
LiteLLM solves exactly these pain points.
It provides a single, OpenAI-compatible interface that lets you call more than 100 LLM providers using almost identical code—whether you’re using cloud APIs (OpenAI, Anthropic, Azure, Google Vertex/Gemini, Bedrock, Groq, Mistral, Cohere, Together AI, Fireworks, DeepInfra) or self-hosted solutions (Ollama, vLLM, Llamafile, LM Studio, Xinference).
Below we’ll walk through what LiteLLM actually does, how people use it in real projects, and which path (SDK vs Proxy) makes sense for different team sizes and stages.
Real-World Problems LiteLLM Solves
Here are the most common scenarios developers encounter that lead them to adopt LiteLLM:
-
Rapid model experimentation without rewriting large portions of code -
Unifying multiple providers behind one endpoint for easier A/B testing and fallback -
Centralizing API key management, spend visibility, and per-user/per-team budgets -
Running production workloads with automatic retry, load balancing, and failover across regions or providers -
Making local or private models appear as if they were OpenAI endpoints -
Integrating emerging agent protocols (A2A) and tool ecosystems (MCP servers)
In short: LiteLLM gives you maximum model choice with minimum integration pain, while moving governance and observability concerns out of your application code and into infrastructure.
Core Capabilities (as of January 2026)
LiteLLM supports a wide range of endpoints and features across providers.
| Endpoint / Feature | Broad Support? | Common Use Cases |
|---|---|---|
/chat/completions |
Yes (core) | Standard chat, function calling, reasoning traces |
/embeddings |
Many providers | Vector search, RAG, semantic similarity |
| Image generation | Partial | DALL·E-style image creation |
| Audio (TTS / STT) | Partial | Speech synthesis and transcription |
/rerank |
Several | Improving retrieval quality |
| Batch processing | Partial | High-volume asynchronous jobs |
| A2A Agent protocol | Yes | LangGraph, Vertex AI Agent Engine, Bedrock agents |
| MCP tool bridging | Yes | Connecting external tools (GitHub, DBs, etc.) to LLMs |
| Virtual keys & spend caps | Proxy only | Multi-tenant SaaS, internal department chargeback |
| Intelligent routing/fallback | SDK + Proxy | High availability, cost/quality optimization |
| Semantic/exact caching | Proxy | Dramatically reduce redundant calls |
| Admin UI + detailed metrics | Proxy | Usage dashboards, slow query detection, export |
Performance note: Official benchmarks show ~8 ms P95 latency at 1,000 requests per second—negligible for most applications.
Two Main Ways to Use LiteLLM
Most users start with one approach and later adopt the other.
| Dimension | Python SDK | LiteLLM Proxy Server (AI Gateway) |
|---|---|---|
| Primary users | Individual developers, early prototypes | Platform teams, production apps, multi-tenant |
| Deployment | pip install → import in code | Docker / Render / Railway / Kubernetes |
| Management overhead | Very low | Moderate (but unlocks powerful features) |
| Authentication & quotas | Basic (env vars) | Virtual keys, budgets, teams, RBAC |
| Routing & resilience | Router class | Advanced rules, caching, rate limits, fallbacks |
| Monitoring | Via callbacks to Langfuse/Lunary/etc. | Built-in Admin UI + Prometheus export |
| Time to first request | ~5 minutes | 15–60 minutes (depending on infra) |
| Best for | PoCs, personal projects, small teams | Scaling, cost control, governance |
Option 1: Python SDK – Quick Start (Recommended First Step)
Installation
pip install litellm
Simplest possible usage
from litellm import completion
import os
# Set only the keys you need
os.environ["OPENAI_API_KEY"] = "sk-..."
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."
os.environ["GROQ_API_KEY"] = "gsk_..."
# Switch models by changing only this line
response = completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "What's the weather like in Singapore today?"}]
)
print(response.choices[0].message.content)
Want to try Claude 3.5 Sonnet instead?
model="anthropic/claude-3-5-sonnet-20241022"
Switch to Groq’s Llama 3.1 405B?
model="groq/llama-3.1-405b-reasoning"
Run locally with Ollama?
model="ollama/llama3.1"
Same input shape, same output shape—only the model string changes. This is the single biggest reason developers love the SDK.
Full list of model name formats → Supported Providers
Option 2: LiteLLM Proxy – Centralized LLM Gateway
For teams that need control at scale, the Proxy turns LiteLLM into a production-grade API gateway.
Quick local test (not for production)
pip install 'litellm[proxy]'
litellm --model openai/gpt-4o
Then call it like any OpenAI endpoint:
import openai
client = openai.OpenAI(
api_key="anything-for-local-testing", # real auth via virtual keys
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello from Singapore!"}]
)
Production recommendation: Docker + config file
docker run \
-v $(pwd)/config.yaml:/app/config.yaml \
-v $(pwd)/.env:/app/.env \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-stable \
--config /app/config.yaml
Example config.yaml snippet
model_list:
- model_name: gpt-4o
litellm_params:
model: azure/gpt-4o-east-us
api_base: https://your-resource.openai.azure.com/
api_key: os.environ/AZURE_API_KEY
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
master_key: sk-your-admin-master-key
Standout Proxy Features
-
Virtual keys + per-key budgets — Generate scoped keys for products, teams, or external partners with hard token limits -
Granular spend tracking — Attribute usage to projects, users, or tags for accurate internal billing -
Smart routing & automatic failover — Define fallback chains (e.g., try Groq → Together → self-hosted) -
Semantic & exact-match caching — Slash costs and latency on repeated prompts -
Admin dashboard — Real-time usage graphs, error rates, per-model performance -
MCP tool integration — Expose GitHub, internal APIs, databases as tools to any model -
A2A agent support — Treat LangGraph agents, Vertex agents, Bedrock agents as callable models
Frequently Asked Questions
Does LiteLLM send my prompts to a third party?
No. It only transforms formats and routes requests. Your data goes directly to the provider you configured.
Is the proxy a performance bottleneck?
At 1,000 RPS, P95 latency is around 8 ms—usually not measurable in end-to-end application performance.
Can I run everything offline / air-gapped?
Yes—pair the proxy with Ollama, vLLM, or any local inference server.
How do I automatically pick the cheapest/fastest model?
Use the Router class (SDK) or routing rules + fallbacks (Proxy).
Does it work with Cursor, Continue.dev, or other OpenAI-compatible tools?
Yes—just point them at your Proxy URL and use a virtual key.
Quick Decision Table – Which Path Should You Choose?
| Your Situation | Recommended Path | Next Action |
|---|---|---|
| Solo developer or early PoC | Python SDK | pip install litellm and try one line |
| Small team, < 1M tokens/month | SDK + Router | Add fallbacks when you hit limits |
| Mid-size team needing cost allocation | Proxy Server | Deploy via Docker + virtual keys |
| Already managing many API keys | Proxy Server | Highest priority |
| Mostly local / private models | Proxy + local backends | Strongly recommended |
Why LiteLLM Keeps Gaining Momentum in 2026
The LLM landscape is more fragmented than ever: new models launch weekly, prices swing dramatically, regional availability varies, and safety/alignment policies differ across vendors. Betting on a single provider is increasingly risky.
LiteLLM’s core promise is simple yet powerful: let developers choose the best model for the job today—and switch effortlessly tomorrow—without rewriting application logic.
Give the Python SDK a try with one line of code. Most people who do never go back to vendor-specific clients.
Questions about setup, routing rules, cost tracking, or integrating agents/tools? Drop a comment—I’m happy to dive deeper.
Enjoy building! 🚀

