One Balance: API Key Load Balancer Revolution for Cloudflare Users

高效码农

5 months ago

Building an API Key Load Balancer with Cloudflare: Introducing One Balance

Hello there. If you’re working with AI services and have multiple API keys—especially ones with usage limits like those from Google AI Studio—you know how tricky it can be to manage them. Switching between keys manually to avoid hitting limits too soon can feel like a chore. That’s where One Balance comes in. It’s a tool built on Cloudflare that acts as a smart load balancer for your API keys. It uses Cloudflare’s AI Gateway for routing and adds features like rotating keys and checking their health.

Think of it this way: if you have several keys, each with its own quota, One Balance helps you use them in turn, making the most of what you have. It’s designed to handle requests intelligently, so you don’t waste time on keys that are temporarily unavailable or blocked.

One Balance also offers a way to support the project through sponsorship, and in return, you can get a Gemini key. It’s a straightforward way to contribute and keep the development going.

What Makes One Balance Stand Out?

Let’s break down what this tool offers. I’ll list the main features to make it easy to follow.

☾ Reducing the Risk of Blocks: By routing requests through Cloudflare AI Gateway, it helps lower the chances of your API keys getting blocked. This is particularly helpful for keys like Gemini, which can be sensitive to overuse.
☾ Smart Error Handling: The system deals with errors in a thoughtful way. For example, it can spot when a specific model has hit its rate limit and pause it temporarily. With Google AI Studio, it even tells apart short-term limits (like per-minute) and longer ones (like daily), cooling them off differently—say, 24 hours for a daily limit.
☾ Automatic Shutdown for Bad Keys: If a key gets blocked by the provider (showing a 403 error), the tool permanently disables it to avoid pointless retries.
☾ Free and Easy to Set Up: It runs on Cloudflare Workers, which means you can deploy it with one command. You can make full use of Cloudflare’s free tier, including optimizations for CPU time when handling lots of keys.
☾ Works with Many Services: It supports any API provider that Cloudflare AI Gateway handles. That includes rotating keys for Gemini text-to-speech, which might be unique to this tool—it’s already in use on sites like Zenfeed.xyz for creating real-time news podcasts.

These points make One Balance a reliable choice for anyone juggling multiple keys. For instance, if you’re building an app that calls AI models often, this balancer ensures smooth operation without you constantly monitoring limits.

Getting Started: Preparing Your Setup

Before diving into deployment, let’s cover the basics. You’ll need a couple of things ready.

First, install Node.js from its official site. It’s a runtime environment that lets you run JavaScript outside a browser. Next, add pnpm, a package manager that helps with installing dependencies efficiently. You can find instructions on their website.

You’ll also need a Cloudflare account. If you don’t have one, signing up is free and straightforward.

Step-by-Step Guide to Creating Your AI Gateway

Once your environment is set, the first real step is setting up the AI Gateway on Cloudflare.

Log into your Cloudflare dashboard. Go to the AI section and find AI Gateway. Create a new one and name it “one-balance.” This name is important because the tool will use it for routing.

Why do this? The gateway acts as a middleman, forwarding your requests to the AI providers while adding a layer of protection and compatibility.

Deploying One Balance to Cloudflare

Now, let’s deploy the tool itself. Open your terminal or command prompt.

Start by cloning the repository:

git clone https://github.com/glidea/one-balance.git
cd one-balance
pnpm install

This pulls the code and installs what’s needed.

Next, set an authentication key—think of it as a password for your setup—and deploy. For Mac or Linux users:

AUTH_KEY=your-super-secret-auth-key pnpm run deploycf

For Windows users with PowerShell:

$env:AUTH_KEY = "your-super-secret-auth-key"; pnpm run deploycf

The script will ask you to log in to wrangler if you’re not already. Wrangler is Cloudflare’s tool for managing deployments. It will create a D1 database automatically—this is where key statuses are stored—and push the Worker live.

When it’s done, you’ll get a URL for your Worker, something like https://one-balance-backend.your-subdomain.workers.dev. That’s your entry point.

If you’re in a region where access might be spotty, like some parts of China, consider using a VPN or proxy to reach the management page reliably.

How to Use One Balance Effectively

With deployment complete, let’s talk about putting it to work. There are two main parts: setting up your keys and making API calls.

Setting Up Keys for Rotation

Head to your Worker URL in a browser. This opens a management interface where you add the keys you want to rotate.

A tip here: try not to share keys with others. If multiple people use the same key without the system knowing the full picture, it could lead to more rate limit errors (those 429 responses). Keeping keys private helps the balancer work better.

Making API Requests

The base URL for requests is https://your-worker-url/api/followed-by-the-path-to-the-AI-service.

For example, if your Worker is at https://one-balance-backend.workers.dev and you want to query Google Gemini 2.5 Pro, the full URL would be https://one-balance-backend.workers.dev/api/google-ai-studio/v1beta/models/gemini-2.5-pro:generateContent.

Handling Authentication

You need to authenticate with the AUTH_KEY you set during deployment. It varies by provider:

☾ For OpenAI, use Authorization: Bearer your-super-secret-auth-key in the headers.
☾ For Google, Anthropic, Elevenlabs, Azure OpenAI, or Cartesia, use the specific header like x-goog-api-key: your-super-secret-auth-key.

This keeps things secure.

Practical Examples with curl

To test, use curl—a command-line tool for making requests. Here are some examples.

Querying Google Gemini Directly (With Streaming Support)

curl "https://your-worker-url/api/google-ai-studio/v1/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \
 -H 'content-type: application/json' \
 -H 'x-goog-api-key: your-super-secret-auth-key' \
 -d '{
      "contents": [
          {
            "role":"user",
            "parts": [
              {"text":"Who are you?"}
            ]
          }
        ]
      }'

This supports streaming, meaning responses come in real-time.

Using OpenAI-Compatible Format for Google Gemini (No Streaming, Possible Encoding Issues with Non-English Text)

curl "https://your-worker-url/api/compat/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-super-secret-auth-key" \
  -d '{
    "model": "google-ai-studio/gemini-2.5-pro",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

The model is specified as provider/model. Check Cloudflare’s docs for formats. Note: no streaming, and text in languages like Chinese might not display correctly.

Calling OpenAI

curl https://your-worker-url/api/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-super-secret-auth-key" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

For other providers, follow similar patterns based on Cloudflare AI Gateway guidelines.

In tools like Cherry Studio, you might see a visual interface for this.

What’s Next for One Balance?

The tool has some planned improvements. These include:

☾ Allowing dynamic forwarding of keys.
☾ Supporting custom channels.
☾ Creating virtual models that combine multiple sources.
☾ Enabling distribution of user keys.

These could make it even more versatile in the future.

Understanding How One Balance Operates

To get a better grasp, let’s look under the hood. I’ll use diagrams and explanations to keep it clear.

Overall Structure

One Balance sits in the middle, handling requests and passing them to Cloudflare AI Gateway. Here’s a simple diagram:

graph TD
    subgraph "User Side"
        User["Client"]
    end

    subgraph "Cloudflare Setup"
        OneBalance["One Balance Worker"]
        D1["D1 Database"]
        AIGW["Cloudflare AI Gateway"]

        OneBalance -- "Get/Update Key Status" <--> D1
        OneBalance -- "Forward Request" --> AIGW
    end

    subgraph "External Services"
        Provider["AI Providers (Google, OpenAI, etc.)"]
    end

    User -- "1. API Request (with AUTH_KEY)" --> OneBalance
    AIGW -- "2. Proxy Request (with Provider Key)" --> Provider
    Provider -- "3. API Response" --> AIGW
    AIGW -- "4. Response" --> OneBalance
    OneBalance -- "5. Final Response" --> User

Why D1 over something like KV? The free limits on KV are lower, so D1 is better for storing key info.

Life Cycle of a Key

Keys go through states based on how they perform. Here’s the flow:

graph TD
    NonExistent("(Does Not Exist)")

    subgraph "Life Cycle States"
        direction LR
        Active("Active / Ready")
        CoolingDown("Cooling Down / Paused (for Specific Model)")
        Blocked("Blocked / Permanently Disabled")
    end

    NonExistent -- "1. Create (Admin Adds)" --> Active

    Active -- "2a. Use: Success (2xx)" --> Active
    Active -- "2b. Use: Rate Limited (429)" --> CoolingDown
    Active -- "2c. Use: Invalid (401, 403)" --> Blocked

    CoolingDown -- "Cooling Period Ends" --> Active

    Active -- "3. Delete" --> NonExistent
    Blocked -- "3. Delete" --> NonExistent
    CoolingDown -- "3. Delete" --> NonExistent

This ensures only working keys are used.

Ensuring Reliability

The tool stays dependable through:

Auto Shutdown and Retry: If a key fails with 401 or 403, it’s marked blocked, and the system tries the next one.
Per-Model Rate Handling: For 429 errors, it pauses just that model on the key. For Google AI Studio, it handles minute-based limits (short pause) and day-based (24-hour pause) separately.
Built on Cloudflare: The platform’s reliability covers Workers, D1, and AI Gateway.

Scaling Up

Serverless Design: Cloudflare handles growth automatically—no servers to manage.
Separate State: Workers don’t hold data; it’s in D1, making expansion easy.
Adding More: New keys via the interface; new providers by adding a config line for headers.

Monitoring and Insights

Event Logs: Key events like blocks or pauses are logged. View them in Cloudflare’s dashboard.
Gateway Analytics: Track requests, errors, delays, and costs.
Management View: See key statuses (active, blocked) and pause details directly.

Common Questions About One Balance

You might have some questions. Here are answers to ones that come up often.

What services does it work with? Any that Cloudflare AI Gateway supports, like Google or OpenAI. It handles Gemini text-to-speech rotation too.

What if a key gets blocked? It detects 403 errors and disables it permanently.

How does pausing work? For rate limits (429), it’s per-model. Google AI Studio gets special treatment for short vs. long limits.

How do I manage keys? Use the Worker URL interface to add, check, or remove.

Is it free? Yes, using Cloudflare’s free plan. Deployment is quick, with CPU optimizations.

Does it support real-time responses? Yes, in native Gemini format, but not in OpenAI compat mode, where non-English text might garble.

Why Cloudflare AI Gateway? It routes safely, reducing blocks, and works with various providers.

How to troubleshoot? Check dashboard logs and analytics; the interface shows statuses.

What’s planned? Dynamic key forwarding, custom channels, virtual models, user key sharing.

Step-by-Step: Sending Your First Request

Deploy and get your URL.
Add keys in the interface.
Pick a service, like Gemini.
Build the URL with /api/path.
Add auth headers.
Use curl or similar to send.
Review the response; adjust if needed.

Why Choose One Balance for API Key Management?

If you’re dealing with limited API keys, this tool simplifies rotation and health checks. It lets you focus on your project.

For example, in an app generating content with Gemini, it keeps things running evenly.

Its text-to-speech support is practical, as seen in podcast generation.

Digging Deeper into the Balancing Logic

The core is rotating keys from the active pool, with checks. Success keeps it active; limits pause it; invalids block it.

For Google, distinguishing limit types maximizes usage.

Tips for the Management Interface

Add keys carefully. Check statuses: active means good, cooling down shows model pauses, blocked means out.

Regular reviews keep it optimal.

Possible Issues and Fixes

Access problems? Use a proxy.

Text issues? Stick to native formats.

Sharing risks? Keep keys private.

Wrapping Up: Give One Balance a Try

One Balance makes managing API keys straightforward. Easy setup, smart features—it’s worth exploring if you have multiple keys.