Building an API Key Load Balancer with Cloudflare: Introducing One Balance
Hello there. If you’re working with AI services and have multiple API keys—especially ones with usage limits like those from Google AI Studio—you know how tricky it can be to manage them. Switching between keys manually to avoid hitting limits too soon can feel like a chore. That’s where One Balance comes in. It’s a tool built on Cloudflare that acts as a smart load balancer for your API keys. It uses Cloudflare’s AI Gateway for routing and adds features like rotating keys and checking their health.
Think of it this way: if you have several keys, each with its own quota, One Balance helps you use them in turn, making the most of what you have. It’s designed to handle requests intelligently, so you don’t waste time on keys that are temporarily unavailable or blocked.
One Balance also offers a way to support the project through sponsorship, and in return, you can get a Gemini key. It’s a straightforward way to contribute and keep the development going.
What Makes One Balance Stand Out?
Let’s break down what this tool offers. I’ll list the main features to make it easy to follow.
-
☾ Reducing the Risk of Blocks: By routing requests through Cloudflare AI Gateway, it helps lower the chances of your API keys getting blocked. This is particularly helpful for keys like Gemini, which can be sensitive to overuse. -
☾ Smart Error Handling: The system deals with errors in a thoughtful way. For example, it can spot when a specific model has hit its rate limit and pause it temporarily. With Google AI Studio, it even tells apart short-term limits (like per-minute) and longer ones (like daily), cooling them off differently—say, 24 hours for a daily limit. -
☾ Automatic Shutdown for Bad Keys: If a key gets blocked by the provider (showing a 403 error), the tool permanently disables it to avoid pointless retries. -
☾ Free and Easy to Set Up: It runs on Cloudflare Workers, which means you can deploy it with one command. You can make full use of Cloudflare’s free tier, including optimizations for CPU time when handling lots of keys. -
☾ Works with Many Services: It supports any API provider that Cloudflare AI Gateway handles. That includes rotating keys for Gemini text-to-speech, which might be unique to this tool—it’s already in use on sites like Zenfeed.xyz for creating real-time news podcasts.
These points make One Balance a reliable choice for anyone juggling multiple keys. For instance, if you’re building an app that calls AI models often, this balancer ensures smooth operation without you constantly monitoring limits.
Getting Started: Preparing Your Setup
Before diving into deployment, let’s cover the basics. You’ll need a couple of things ready.
First, install Node.js from its official site. It’s a runtime environment that lets you run JavaScript outside a browser. Next, add pnpm, a package manager that helps with installing dependencies efficiently. You can find instructions on their website.
You’ll also need a Cloudflare account. If you don’t have one, signing up is free and straightforward.
Step-by-Step Guide to Creating Your AI Gateway
Once your environment is set, the first real step is setting up the AI Gateway on Cloudflare.
Log into your Cloudflare dashboard. Go to the AI section and find AI Gateway. Create a new one and name it “one-balance.” This name is important because the tool will use it for routing.
Why do this? The gateway acts as a middleman, forwarding your requests to the AI providers while adding a layer of protection and compatibility.
Deploying One Balance to Cloudflare
Now, let’s deploy the tool itself. Open your terminal or command prompt.
Start by cloning the repository:
git clone https://github.com/glidea/one-balance.git
cd one-balance
pnpm install
This pulls the code and installs what’s needed.
Next, set an authentication key—think of it as a password for your setup—and deploy. For Mac or Linux users:
AUTH_KEY=your-super-secret-auth-key pnpm run deploycf
For Windows users with PowerShell:
$env:AUTH_KEY = "your-super-secret-auth-key"; pnpm run deploycf
The script will ask you to log in to wrangler if you’re not already. Wrangler is Cloudflare’s tool for managing deployments. It will create a D1 database automatically—this is where key statuses are stored—and push the Worker live.
When it’s done, you’ll get a URL for your Worker, something like https://one-balance-backend.your-subdomain.workers.dev. That’s your entry point.
If you’re in a region where access might be spotty, like some parts of China, consider using a VPN or proxy to reach the management page reliably.
How to Use One Balance Effectively
With deployment complete, let’s talk about putting it to work. There are two main parts: setting up your keys and making API calls.
Setting Up Keys for Rotation
Head to your Worker URL in a browser. This opens a management interface where you add the keys you want to rotate.
A tip here: try not to share keys with others. If multiple people use the same key without the system knowing the full picture, it could lead to more rate limit errors (those 429 responses). Keeping keys private helps the balancer work better.
Making API Requests
The base URL for requests is https://your-worker-url/api/followed-by-the-path-to-the-AI-service.
For example, if your Worker is at https://one-balance-backend.workers.dev and you want to query Google Gemini 2.5 Pro, the full URL would be https://one-balance-backend.workers.dev/api/google-ai-studio/v1beta/models/gemini-2.5-pro:generateContent.
Handling Authentication
You need to authenticate with the AUTH_KEY you set during deployment. It varies by provider:
-
☾ For OpenAI, use Authorization: Bearer your-super-secret-auth-key in the headers. -
☾ For Google, Anthropic, Elevenlabs, Azure OpenAI, or Cartesia, use the specific header like x-goog-api-key: your-super-secret-auth-key.
This keeps things secure.
Practical Examples with curl
To test, use curl—a command-line tool for making requests. Here are some examples.
Querying Google Gemini Directly (With Streaming Support)
curl "https://your-worker-url/api/google-ai-studio/v1/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \
-H 'content-type: application/json' \
-H 'x-goog-api-key: your-super-secret-auth-key' \
-d '{
"contents": [
{
"role":"user",
"parts": [
{"text":"Who are you?"}
]
}
]
}'
This supports streaming, meaning responses come in real-time.
Using OpenAI-Compatible Format for Google Gemini (No Streaming, Possible Encoding Issues with Non-English Text)
curl "https://your-worker-url/api/compat/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-super-secret-auth-key" \
-d '{
"model": "google-ai-studio/gemini-2.5-pro",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
The model is specified as provider/model. Check Cloudflare’s docs for formats. Note: no streaming, and text in languages like Chinese might not display correctly.
Calling OpenAI
curl https://your-worker-url/api/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-super-secret-auth-key" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
For other providers, follow similar patterns based on Cloudflare AI Gateway guidelines.
In tools like Cherry Studio, you might see a visual interface for this.
What’s Next for One Balance?
The tool has some planned improvements. These include:
-
☾ Allowing dynamic forwarding of keys. -
☾ Supporting custom channels. -
☾ Creating virtual models that combine multiple sources. -
☾ Enabling distribution of user keys.
These could make it even more versatile in the future.
Understanding How One Balance Operates
To get a better grasp, let’s look under the hood. I’ll use diagrams and explanations to keep it clear.
Overall Structure
One Balance sits in the middle, handling requests and passing them to Cloudflare AI Gateway. Here’s a simple diagram:
graph TD
subgraph "User Side"
User["Client"]
end
subgraph "Cloudflare Setup"
OneBalance["One Balance Worker"]
D1["D1 Database"]
AIGW["Cloudflare AI Gateway"]
OneBalance -- "Get/Update Key Status" <--> D1
OneBalance -- "Forward Request" --> AIGW
end
subgraph "External Services"
Provider["AI Providers (Google, OpenAI, etc.)"]
end
User -- "1. API Request (with AUTH_KEY)" --> OneBalance
AIGW -- "2. Proxy Request (with Provider Key)" --> Provider
Provider -- "3. API Response" --> AIGW
AIGW -- "4. Response" --> OneBalance
OneBalance -- "5. Final Response" --> User
Why D1 over something like KV? The free limits on KV are lower, so D1 is better for storing key info.
Life Cycle of a Key
Keys go through states based on how they perform. Here’s the flow:
graph TD
NonExistent("(Does Not Exist)")
subgraph "Life Cycle States"
direction LR
Active("Active / Ready")
CoolingDown("Cooling Down / Paused (for Specific Model)")
Blocked("Blocked / Permanently Disabled")
end
NonExistent -- "1. Create (Admin Adds)" --> Active
Active -- "2a. Use: Success (2xx)" --> Active
Active -- "2b. Use: Rate Limited (429)" --> CoolingDown
Active -- "2c. Use: Invalid (401, 403)" --> Blocked
CoolingDown -- "Cooling Period Ends" --> Active
Active -- "3. Delete" --> NonExistent
Blocked -- "3. Delete" --> NonExistent
CoolingDown -- "3. Delete" --> NonExistent
This ensures only working keys are used.
Ensuring Reliability
The tool stays dependable through:
-
Auto Shutdown and Retry: If a key fails with 401 or 403, it’s marked blocked, and the system tries the next one. -
Per-Model Rate Handling: For 429 errors, it pauses just that model on the key. For Google AI Studio, it handles minute-based limits (short pause) and day-based (24-hour pause) separately. -
Built on Cloudflare: The platform’s reliability covers Workers, D1, and AI Gateway.
Scaling Up
-
Serverless Design: Cloudflare handles growth automatically—no servers to manage. -
Separate State: Workers don’t hold data; it’s in D1, making expansion easy. -
Adding More: New keys via the interface; new providers by adding a config line for headers.
Monitoring and Insights
-
Event Logs: Key events like blocks or pauses are logged. View them in Cloudflare’s dashboard. -
Gateway Analytics: Track requests, errors, delays, and costs. -
Management View: See key statuses (active, blocked) and pause details directly.
Common Questions About One Balance
You might have some questions. Here are answers to ones that come up often.
What services does it work with? Any that Cloudflare AI Gateway supports, like Google or OpenAI. It handles Gemini text-to-speech rotation too.
What if a key gets blocked? It detects 403 errors and disables it permanently.
How does pausing work? For rate limits (429), it’s per-model. Google AI Studio gets special treatment for short vs. long limits.
How do I manage keys? Use the Worker URL interface to add, check, or remove.
Is it free? Yes, using Cloudflare’s free plan. Deployment is quick, with CPU optimizations.
Does it support real-time responses? Yes, in native Gemini format, but not in OpenAI compat mode, where non-English text might garble.
Why Cloudflare AI Gateway? It routes safely, reducing blocks, and works with various providers.
How to troubleshoot? Check dashboard logs and analytics; the interface shows statuses.
What’s planned? Dynamic key forwarding, custom channels, virtual models, user key sharing.
Step-by-Step: Sending Your First Request
-
Deploy and get your URL. -
Add keys in the interface. -
Pick a service, like Gemini. -
Build the URL with /api/path. -
Add auth headers. -
Use curl or similar to send. -
Review the response; adjust if needed.
Why Choose One Balance for API Key Management?
If you’re dealing with limited API keys, this tool simplifies rotation and health checks. It lets you focus on your project.
For example, in an app generating content with Gemini, it keeps things running evenly.
Its text-to-speech support is practical, as seen in podcast generation.
Digging Deeper into the Balancing Logic
The core is rotating keys from the active pool, with checks. Success keeps it active; limits pause it; invalids block it.
For Google, distinguishing limit types maximizes usage.
Tips for the Management Interface
Add keys carefully. Check statuses: active means good, cooling down shows model pauses, blocked means out.
Regular reviews keep it optimal.
Possible Issues and Fixes
Access problems? Use a proxy.
Text issues? Stick to native formats.
Sharing risks? Keep keys private.
Wrapping Up: Give One Balance a Try
One Balance makes managing API keys straightforward. Easy setup, smart features—it’s worth exploring if you have multiple keys.