Gemini Balance: The Ultimate Gemini API Proxy for Scalable AI Service Deployment

Introduction

In today’s rapidly evolving AI landscape, developers and organizations need reliable, scalable solutions to integrate large language models into their applications. Gemini Balance is a lightweight Python application built with FastAPI that addresses these needs by acting as a proxy and load balancer for the Google Gemini API (and OpenAI‐compatible endpoints). By managing multiple API keys, automating failover and retries, and providing token‐counting, monitoring, and a seamless developer experience, Gemini Balance simplifies deploying and maintaining AI services in production and development environments.

This article will guide you through:

Core benefits and use cases
High‐level architecture and module breakdown
Step‐by‐step setup via Docker and local development
Detailed configuration options
Key API endpoints and usage patterns
Best practices for security, monitoring, and troubleshooting
Community and contribution guidelines

Throughout, we’ll use clear, non‑technical language and explain why each feature matters, so that even readers with a junior college background can follow along.

Why Use Gemini Balance?

1. Simplified Multi‑Key Management

Problem: Relying on a single API key for AI calls often leads to throttling or downtime when that key hits usage limits or expires.
Solution: Gemini Balance lets you register multiple Gemini API keys. It automatically cycles through them in round‑robin fashion. If one key fails or exceeds its limit, the system retries with the next key—ensuring uninterrupted service.

2. Cost‑Aware Token Counting

Problem: Sending a large prompt or document to an LLM can incur high costs. Estimating token usage in advance helps control billing.
Solution: The /models/{model_name}:countTokens endpoint returns the exact token count for any input before you generate content. Armed with this information, you can adjust prompts or split requests to stay within budget.

3. Unified Gemini & OpenAI Compatibility

Problem: Different AI services use different API formats. Migrating from one to another often requires rewriting client code.
Solution: Gemini Balance supports both native Gemini and OpenAI API formats. You can point your existing OpenAI‑based client to Gemini Balance without code changes. The proxy automatically translates requests into the appropriate Gemini calls under the hood.

4. Built‐In Image Generation & Editing

Problem: Integrating AI‑powered image features often requires separate services or complex SDKs.
Solution: Configure IMAGE_MODELS in Gemini Balance to enable image creation and editing via your chosen Gemini or OpenAI model. Use simple endpoints like /models/{model}-image:generate or /models/{model}-image:edit to produce or modify images within the same proxy.

5. Real‑Time Search Integration

Problem: Static language models lack up‑to‑date world knowledge.
Solution: By defining SEARCH_MODELS, you can route certain requests through a real‑time search service. This allows the AI to fetch current information from the web, improving answer accuracy for time‑sensitive questions.

6. Zero‑Downtime Configuration Updates

Problem: Changing settings often requires restarting the server, causing brief outages.
Solution: Gemini Balance’s management dashboard applies configuration changes instantly—no service restarts needed. Edit API keys, toggles, or model lists in the web UI, click Save, and new settings take effect immediately.

Architecture & Module Overview

Below is a high‑level view of the project structure. Each folder contains related code and configuration:


app/
├── config/       # Load environment variables and default settings
├── core/         # FastAPI application setup, middleware, CORS
├── database/     # Models for persisting API keys, logs, usage stats
├── router/       # API endpoint definitions: Gemini, OpenAI, status pages
├── service/      # Business logic: key rotation, request forwarding, token counting
├── scheduler/    # Background tasks: periodic health checks, key recovery
├── utils/        # Helpers: logging, error formatting, token utilities
├── main.py       # Entry point: instantiate FastAPI app and include routers

config/
Loads variables like API_KEYS, PROXIES, MAX_RETRIES, and CHECK_INTERVAL_HOURS from the .env file. Defaults are provided for all options.
core/
Sets up the FastAPI application instance, applies security middleware (CORS, HTTPS redirect), and mounts static file routes for the web dashboard.
database/
Defines ORM models for storing API key metadata (usage count, failure count), request logs, and error logs. Supports both MySQL and SQLite via a config switch.
router/
Contains route definitions for:
- Gemini‐native endpoints (/models, /models/{model}:generateContent, etc.)
- OpenAI‐style endpoints (/openai/v1/chat/completions, /hf/v1/embeddings, etc.)
- Status pages (/keys_status, authenticated dashboard routes)
service/
Implements core functions:
- KeyManager: rotate keys, track failures, auto‑disable unhealthy keys, and re‑enable after a defined interval.
- ProxyHandler: forward requests to Gemini, handle response streaming, and format output to match client format.
- TokenCounter: compute token counts using Gemini’s tokenization rules.
scheduler/
Periodically scans disabled keys and attempts to re‑validate them. Controlled by CHECK_INTERVAL_HOURS and MAX_FAILURES.
utils/
Includes logging configuration, custom exception classes, and helpers for parsing request payloads and environment variables.

Quick Start Guide

Prerequisites

Docker Engine (supports AMD and ARM)
A .env file with at least API_KEYS and ALLOWED_TOKENS defined

1. Build & Run with Docker

# Build local image
docker build -t gemini-balance .

# Run container (production-ready settings)
docker run -d \
  -p 8000:8000 \
  --env-file .env \
  --dns 8.8.8.8 --dns 8.8.4.4 \
  gemini-balance

Port mapping: -p 8000:8000 exposes the API on your host’s port 8000.
DNS flags: Avoid DNS loops when using reverse proxies.
Volume mounts (optional): For SQLite persistence, add -v /host/data:/app/data.

2. Pull & Run Official Image

docker pull ghcr.io/snailyp/gemini-balance:latest
docker run -d -p 8000:8000 --env-file .env ghcr.io/snailyp/gemini-balance:latest

3. Local Development

git clone https://github.com/your‑fork/gemini-balance.git
cd gemini-balance
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

--reload: Auto‑reload on code changes (development only).
Access the API dashboard at http://localhost:8000/docs or your web UI at http://localhost:8000/keys_status.

Configuration Breakdown

Gemini Balance is driven entirely by environment variables. Below are the most important settings you’ll find in .env:

Variable	Description	Default / Example
API_KEYS	Comma‑separated list of your Gemini API keys for rotation.	`key1,key2,key3`
ALLOWED_TOKENS	Valid bearer tokens permitted to access the proxy.	`tokenA,tokenB`
AUTH_TOKEN	(Optional) Super‑admin token with full privileges. Defaults to the first ALLOWED_TOKENS key.	`superAdminToken`
TEST_MODEL	Model name used to verify new API keys.	`gemini-1.5-flash`
IMAGE_MODELS	Models enabled for image generation/editing (comma‑separated).	`gemini-2.0-flash-exp`
SEARCH_MODELS	Models permitted to perform web searches.	`gemini-2.0-flash-exp`
FILTERED_MODELS	Comma‑separated list of model names to exclude from routing.	`gemini-1.0-pro-vision-latest`
PROXIES	HTTP or SOCKS5 proxy URLs used for Gemini API requests.	`http://user:pass@host:port,socks5://host:port`
MAX_RETRIES	How many times to retry a failed API call before cycling keys.	`3`
MAX_FAILURES	Number of consecutive failures before disabling a key.	`3`
CHECK_INTERVAL_HOURS	Hours between attempts to re‑enable disabled keys.	`1`
STREAM_OPTIMIZER_ENABLED	Enable streaming response optimizer for smoother client‑side rendering.	`false`
THINKING_BUDGET_MAP	JSON mapping of model names to “thinkingBudget” values. Budget = 0 removes thinkingConfig.	`{"gemini-1.5-flash":"0"}`
LOCALHOST_BYPASS_AUTH	Allows localhost calls to skip auth checks (development only). Set to `false` in production.	`true`
TIMEZONE	Timezone for logging and scheduled tasks.	`Asia/Shanghai`
DATABASE_TYPE	Choose `mysql` or `sqlite`.	`sqlite`
MYSQL_ / SQLITE_DATABASE*	Database connection details when using MySQL.	—
PAID_KEY, CREATE_IMAGE_MODEL	(Optional) Paid API key and model for high‑quality image generation.	—

Tip: Copy .env.example to .env and fill in your keys and tokens.

API Reference

Gemini Balance exposes both Gemini‐style and OpenAI‐style endpoints. Below are the primary routes:

Gemini‑Style Endpoints

List Models
```
GET /gemini/v1beta/models
```

Generate Content

POST /gemini/v1beta/models/{model_name}:generateContent

Stream Content

POST /gemini/v1beta/models/{model_name}:streamGenerateContent

Count Tokens

POST /gemini/v1beta/models/{model_name}:countTokens

OpenAI‑Style Endpoints

List Models
```
GET /openai/v1/models
```
Chat Completions
```
POST /openai/v1/chat/completions
```
Embeddings
```
POST /openai/v1/embeddings
```
Image Generation
```
POST /openai/v1/images/generations
```

HF‑Style Endpoints

List Models
```
GET /hf/v1/models
```
Chat Completions
```
POST /hf/v1/chat/completions
```
Embeddings
```
POST /hf/v1/embeddings
```

All endpoints forward your request to the appropriate Gemini or OpenAI backend, handling authentication, key rotation, retries, and response formatting so your client code stays unchanged.

Best Practices & Troubleshooting

Monitoring & Health Checks

Visit /keys_status (requires auth) to view each key’s health, usage counts, and recent errors.
Set up external alerts on key disable events by watching the logs or database.

Avoiding DNS Loops

When using a reverse proxy that re‐routes api.generativelanguage.googleapis.com, add --dns 8.8.8.8 --dns 8.8.4.4 to your Docker run command to bypass host‑level DNS overrides.

Handling Thinking Configuration Errors

If you see errors about invalid thinkingConfig, map that model to a budget of 0 in THINKING_BUDGET_MAP. The proxy will drop thinkingConfig entirely for that model.

Updating Model Lists

Gemini Balance can fetch the latest model catalog automatically. Ensure URL_NORMALIZATION_ENABLED=false to trust the auto‑discovery logic.
Manually filter out unwanted models via FILTERED_MODELS to reduce clutter.

Community & Contribution

We welcome bug reports, feature requests, and pull requests! To contribute:

Fork the repository and create a feature branch.
Write clear commit messages and include tests for new functionality.
Submit a pull request against the main branch with a detailed description.

Join our Telegram group for real‑time support and discussions:
https://t.me/+soaHax5lyI0wZDVl

License & Attribution

This project is licensed under CC BY‑NC 4.0 (Attribution‑NonCommercial). You are free to use and modify the code for non‑commercial purposes, provided you attribute the original authors. Commercial redistribution or service resale is prohibited.

“I have never sold this service on any platform; if you encounter someone charging for it, that is unauthorized resale.”

Acknowledgments

snailyp/gemini‑balance original repository for foundational code.
PicGo, SM.MS, CloudFlare‑ImgBed for providing image‑hosting integrations.
All contributors and maintainers who help improve this project.