What is LiteLLM? The Unified Gateway for Calling 100+ LLMs with One Consistent Interface

If you’re building AI applications in 2026, you’ve probably faced this situation more than once:

You start with OpenAI’s GPT models, then want to experiment with Anthropic’s Claude series, Groq’s ultra-fast inference, or perhaps a cost-effective option from DeepSeek or local Ollama deployment. Each provider has its own SDK, slightly different request/response shapes, authentication method, and error handling quirks.

Suddenly your codebase fills with conditional branches, custom adapters, and fragile switching logic. When the team grows and you need centralized cost tracking, key management, rate limiting, or automatic failover, things get even messier.

LiteLLM solves exactly these pain points.

It provides a single, OpenAI-compatible interface that lets you call more than 100 LLM providers using almost identical code—whether you’re using cloud APIs (OpenAI, Anthropic, Azure, Google Vertex/Gemini, Bedrock, Groq, Mistral, Cohere, Together AI, Fireworks, DeepInfra) or self-hosted solutions (Ollama, vLLM, Llamafile, LM Studio, Xinference).

Below we’ll walk through what LiteLLM actually does, how people use it in real projects, and which path (SDK vs Proxy) makes sense for different team sizes and stages.

Real-World Problems LiteLLM Solves

Here are the most common scenarios developers encounter that lead them to adopt LiteLLM:

Rapid model experimentation without rewriting large portions of code
Unifying multiple providers behind one endpoint for easier A/B testing and fallback
Centralizing API key management, spend visibility, and per-user/per-team budgets
Running production workloads with automatic retry, load balancing, and failover across regions or providers
Making local or private models appear as if they were OpenAI endpoints
Integrating emerging agent protocols (A2A) and tool ecosystems (MCP servers)

In short: LiteLLM gives you maximum model choice with minimum integration pain, while moving governance and observability concerns out of your application code and into infrastructure.

Core Capabilities (as of January 2026)

LiteLLM supports a wide range of endpoints and features across providers.

Endpoint / Feature	Broad Support?	Common Use Cases
`/chat/completions`	Yes (core)	Standard chat, function calling, reasoning traces
`/embeddings`	Many providers	Vector search, RAG, semantic similarity
Image generation	Partial	DALL·E-style image creation
Audio (TTS / STT)	Partial	Speech synthesis and transcription
`/rerank`	Several	Improving retrieval quality
Batch processing	Partial	High-volume asynchronous jobs
A2A Agent protocol	Yes	LangGraph, Vertex AI Agent Engine, Bedrock agents
MCP tool bridging	Yes	Connecting external tools (GitHub, DBs, etc.) to LLMs
Virtual keys & spend caps	Proxy only	Multi-tenant SaaS, internal department chargeback
Intelligent routing/fallback	SDK + Proxy	High availability, cost/quality optimization
Semantic/exact caching	Proxy	Dramatically reduce redundant calls
Admin UI + detailed metrics	Proxy	Usage dashboards, slow query detection, export

Performance note: Official benchmarks show ~8 ms P95 latency at 1,000 requests per second—negligible for most applications.

Two Main Ways to Use LiteLLM

Most users start with one approach and later adopt the other.

Dimension	Python SDK	LiteLLM Proxy Server (AI Gateway)
Primary users	Individual developers, early prototypes	Platform teams, production apps, multi-tenant
Deployment	pip install → import in code	Docker / Render / Railway / Kubernetes
Management overhead	Very low	Moderate (but unlocks powerful features)
Authentication & quotas	Basic (env vars)	Virtual keys, budgets, teams, RBAC
Routing & resilience	Router class	Advanced rules, caching, rate limits, fallbacks
Monitoring	Via callbacks to Langfuse/Lunary/etc.	Built-in Admin UI + Prometheus export
Time to first request	~5 minutes	15–60 minutes (depending on infra)
Best for	PoCs, personal projects, small teams	Scaling, cost control, governance

Option 1: Python SDK – Quick Start (Recommended First Step)

Installation

pip install litellm

Simplest possible usage

from litellm import completion
import os

# Set only the keys you need
os.environ["OPENAI_API_KEY"]    = "sk-..."
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."
os.environ["GROQ_API_KEY"]      = "gsk_..."

# Switch models by changing only this line
response = completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "What's the weather like in Singapore today?"}]
)

print(response.choices[0].message.content)

Want to try Claude 3.5 Sonnet instead?

model="anthropic/claude-3-5-sonnet-20241022"

Switch to Groq’s Llama 3.1 405B?

model="groq/llama-3.1-405b-reasoning"

Run locally with Ollama?

model="ollama/llama3.1"

Same input shape, same output shape—only the model string changes. This is the single biggest reason developers love the SDK.

Full list of model name formats → Supported Providers

Option 2: LiteLLM Proxy – Centralized LLM Gateway

For teams that need control at scale, the Proxy turns LiteLLM into a production-grade API gateway.

Quick local test (not for production)

pip install 'litellm[proxy]'
litellm --model openai/gpt-4o

Then call it like any OpenAI endpoint:

import openai

client = openai.OpenAI(
    api_key="anything-for-local-testing",  # real auth via virtual keys
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello from Singapore!"}]
)

Production recommendation: Docker + config file

docker run \
  -v $(pwd)/config.yaml:/app/config.yaml \
  -v $(pwd)/.env:/app/.env \
  -p 4000:4000 \
  ghcr.io/berriai/litellm:main-stable \
  --config /app/config.yaml

Example config.yaml snippet

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o-east-us
      api_base: https://your-resource.openai.azure.com/
      api_key: os.environ/AZURE_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-3-5-sonnet-20241022
      api_key: os.environ/ANTHROPIC_API_KEY

general_settings:
  master_key: sk-your-admin-master-key

Standout Proxy Features

Virtual keys + per-key budgets — Generate scoped keys for products, teams, or external partners with hard token limits
Granular spend tracking — Attribute usage to projects, users, or tags for accurate internal billing
Smart routing & automatic failover — Define fallback chains (e.g., try Groq → Together → self-hosted)
Semantic & exact-match caching — Slash costs and latency on repeated prompts
Admin dashboard — Real-time usage graphs, error rates, per-model performance
MCP tool integration — Expose GitHub, internal APIs, databases as tools to any model
A2A agent support — Treat LangGraph agents, Vertex agents, Bedrock agents as callable models

Frequently Asked Questions

Does LiteLLM send my prompts to a third party?
No. It only transforms formats and routes requests. Your data goes directly to the provider you configured.

Is the proxy a performance bottleneck?
At 1,000 RPS, P95 latency is around 8 ms—usually not measurable in end-to-end application performance.

Can I run everything offline / air-gapped?
Yes—pair the proxy with Ollama, vLLM, or any local inference server.

How do I automatically pick the cheapest/fastest model?
Use the Router class (SDK) or routing rules + fallbacks (Proxy).

Does it work with Cursor, Continue.dev, or other OpenAI-compatible tools?
Yes—just point them at your Proxy URL and use a virtual key.

Quick Decision Table – Which Path Should You Choose?

Your Situation	Recommended Path	Next Action
Solo developer or early PoC	Python SDK	`pip install litellm` and try one line
Small team, < 1M tokens/month	SDK + Router	Add fallbacks when you hit limits
Mid-size team needing cost allocation	Proxy Server	Deploy via Docker + virtual keys
Already managing many API keys	Proxy Server	Highest priority
Mostly local / private models	Proxy + local backends	Strongly recommended

Why LiteLLM Keeps Gaining Momentum in 2026

The LLM landscape is more fragmented than ever: new models launch weekly, prices swing dramatically, regional availability varies, and safety/alignment policies differ across vendors. Betting on a single provider is increasingly risky.

LiteLLM’s core promise is simple yet powerful: let developers choose the best model for the job today—and switch effortlessly tomorrow—without rewriting application logic.

Give the Python SDK a try with one line of code. Most people who do never go back to vendor-specific clients.

Questions about setup, routing rules, cost tracking, or integrating agents/tools? Drop a comment—I’m happy to dive deeper.

Enjoy building! 🚀

What is LiteLLM? A Complete 2026 Guide to Unified AI Gateways