Introduction

In today’s rapidly evolving AI landscape, developers and organizations need reliable, scalable solutions to integrate large language models into their applications. Gemini Balance is a lightweight Python application built with FastAPI that addresses these needs by acting as a proxy and load balancer for the Google Gemini API (and OpenAI‐compatible endpoints). By managing multiple API keys, automating failover and retries, and providing token‐counting, monitoring, and a seamless developer experience, Gemini Balance simplifies deploying and maintaining AI services in production and development environments.

This article will guide you through:

  1. Core benefits and use cases
  2. High‐level architecture and module breakdown
  3. Step‐by‐step setup via Docker and local development
  4. Detailed configuration options
  5. Key API endpoints and usage patterns
  6. Best practices for security, monitoring, and troubleshooting
  7. Community and contribution guidelines

Throughout, we’ll use clear, non‑technical language and explain why each feature matters, so that even readers with a junior college background can follow along.


Why Use Gemini Balance?

1. Simplified Multi‑Key Management

  • Problem: Relying on a single API key for AI calls often leads to throttling or downtime when that key hits usage limits or expires.
  • Solution: Gemini Balance lets you register multiple Gemini API keys. It automatically cycles through them in round‑robin fashion. If one key fails or exceeds its limit, the system retries with the next key—ensuring uninterrupted service.

2. Cost‑Aware Token Counting

  • Problem: Sending a large prompt or document to an LLM can incur high costs. Estimating token usage in advance helps control billing.
  • Solution: The /models/{model_name}:countTokens endpoint returns the exact token count for any input before you generate content. Armed with this information, you can adjust prompts or split requests to stay within budget.

3. Unified Gemini & OpenAI Compatibility

  • Problem: Different AI services use different API formats. Migrating from one to another often requires rewriting client code.
  • Solution: Gemini Balance supports both native Gemini and OpenAI API formats. You can point your existing OpenAI‑based client to Gemini Balance without code changes. The proxy automatically translates requests into the appropriate Gemini calls under the hood.

4. Built‐In Image Generation & Editing

  • Problem: Integrating AI‑powered image features often requires separate services or complex SDKs.
  • Solution: Configure IMAGE_MODELS in Gemini Balance to enable image creation and editing via your chosen Gemini or OpenAI model. Use simple endpoints like /models/{model}-image:generate or /models/{model}-image:edit to produce or modify images within the same proxy.

5. Real‑Time Search Integration

  • Problem: Static language models lack up‑to‑date world knowledge.
  • Solution: By defining SEARCH_MODELS, you can route certain requests through a real‑time search service. This allows the AI to fetch current information from the web, improving answer accuracy for time‑sensitive questions.

6. Zero‑Downtime Configuration Updates

  • Problem: Changing settings often requires restarting the server, causing brief outages.
  • Solution: Gemini Balance’s management dashboard applies configuration changes instantly—no service restarts needed. Edit API keys, toggles, or model lists in the web UI, click Save, and new settings take effect immediately.

Architecture & Module Overview

Below is a high‑level view of the project structure. Each folder contains related code and configuration:


app/
├── config/       # Load environment variables and default settings
├── core/         # FastAPI application setup, middleware, CORS
├── database/     # Models for persisting API keys, logs, usage stats
├── router/       # API endpoint definitions: Gemini, OpenAI, status pages
├── service/      # Business logic: key rotation, request forwarding, token counting
├── scheduler/    # Background tasks: periodic health checks, key recovery
├── utils/        # Helpers: logging, error formatting, token utilities
├── main.py       # Entry point: instantiate FastAPI app and include routers

  • config/
    Loads variables like API_KEYS, PROXIES, MAX_RETRIES, and CHECK_INTERVAL_HOURS from the .env file. Defaults are provided for all options.

  • core/
    Sets up the FastAPI application instance, applies security middleware (CORS, HTTPS redirect), and mounts static file routes for the web dashboard.

  • database/
    Defines ORM models for storing API key metadata (usage count, failure count), request logs, and error logs. Supports both MySQL and SQLite via a config switch.

  • router/
    Contains route definitions for:

    • Gemini‐native endpoints (/models, /models/{model}:generateContent, etc.)
    • OpenAI‐style endpoints (/openai/v1/chat/completions, /hf/v1/embeddings, etc.)
    • Status pages (/keys_status, authenticated dashboard routes)
  • service/
    Implements core functions:

    • KeyManager: rotate keys, track failures, auto‑disable unhealthy keys, and re‑enable after a defined interval.
    • ProxyHandler: forward requests to Gemini, handle response streaming, and format output to match client format.
    • TokenCounter: compute token counts using Gemini’s tokenization rules.
  • scheduler/
    Periodically scans disabled keys and attempts to re‑validate them. Controlled by CHECK_INTERVAL_HOURS and MAX_FAILURES.

  • utils/
    Includes logging configuration, custom exception classes, and helpers for parsing request payloads and environment variables.


Quick Start Guide

Prerequisites

  • Docker Engine (supports AMD and ARM)
  • A .env file with at least API_KEYS and ALLOWED_TOKENS defined

1. Build & Run with Docker

# Build local image
docker build -t gemini-balance .

# Run container (production-ready settings)
docker run -d \
  -p 8000:8000 \
  --env-file .env \
  --dns 8.8.8.8 --dns 8.8.4.4 \
  gemini-balance
  • Port mapping: -p 8000:8000 exposes the API on your host’s port 8000.
  • DNS flags: Avoid DNS loops when using reverse proxies.
  • Volume mounts (optional): For SQLite persistence, add -v /host/data:/app/data.

2. Pull & Run Official Image

docker pull ghcr.io/snailyp/gemini-balance:latest
docker run -d -p 8000:8000 --env-file .env ghcr.io/snailyp/gemini-balance:latest

3. Local Development

git clone https://github.com/your‑fork/gemini-balance.git
cd gemini-balance
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
  • --reload: Auto‑reload on code changes (development only).
  • Access the API dashboard at http://localhost:8000/docs or your web UI at http://localhost:8000/keys_status.

Configuration Breakdown

Gemini Balance is driven entirely by environment variables. Below are the most important settings you’ll find in .env:

Variable Description Default / Example
API_KEYS Comma‑separated list of your Gemini API keys for rotation. key1,key2,key3
ALLOWED_TOKENS Valid bearer tokens permitted to access the proxy. tokenA,tokenB
AUTH_TOKEN (Optional) Super‑admin token with full privileges. Defaults to the first ALLOWED_TOKENS key. superAdminToken
TEST_MODEL Model name used to verify new API keys. gemini-1.5-flash
IMAGE_MODELS Models enabled for image generation/editing (comma‑separated). gemini-2.0-flash-exp
SEARCH_MODELS Models permitted to perform web searches. gemini-2.0-flash-exp
FILTERED_MODELS Comma‑separated list of model names to exclude from routing. gemini-1.0-pro-vision-latest
PROXIES HTTP or SOCKS5 proxy URLs used for Gemini API requests. http://user:pass@host:port,socks5://host:port
MAX_RETRIES How many times to retry a failed API call before cycling keys. 3
MAX_FAILURES Number of consecutive failures before disabling a key. 3
CHECK_INTERVAL_HOURS Hours between attempts to re‑enable disabled keys. 1
STREAM_OPTIMIZER_ENABLED Enable streaming response optimizer for smoother client‑side rendering. false
THINKING_BUDGET_MAP JSON mapping of model names to “thinkingBudget” values. Budget = 0 removes thinkingConfig. {"gemini-1.5-flash":"0"}
LOCALHOST_BYPASS_AUTH Allows localhost calls to skip auth checks (development only). Set to false in production. true
TIMEZONE Timezone for logging and scheduled tasks. Asia/Shanghai
DATABASE_TYPE Choose mysql or sqlite. sqlite
MYSQL_ / SQLITE_DATABASE* Database connection details when using MySQL.
PAID_KEY, CREATE_IMAGE_MODEL (Optional) Paid API key and model for high‑quality image generation.

Tip: Copy .env.example to .env and fill in your keys and tokens.


API Reference

Gemini Balance exposes both Gemini‐style and OpenAI‐style endpoints. Below are the primary routes:

Gemini‑Style Endpoints

  • List Models

    GET /gemini/v1beta/models
    
  • Generate Content

    POST /gemini/v1beta/models/{model_name}:generateContent
    
  • Stream Content

    POST /gemini/v1beta/models/{model_name}:streamGenerateContent
    
  • Count Tokens

    POST /gemini/v1beta/models/{model_name}:countTokens
    

OpenAI‑Style Endpoints

  • List Models

    GET /openai/v1/models
    
  • Chat Completions

    POST /openai/v1/chat/completions
    
  • Embeddings

    POST /openai/v1/embeddings
    
  • Image Generation

    POST /openai/v1/images/generations
    

HF‑Style Endpoints

  • List Models

    GET /hf/v1/models
    
  • Chat Completions

    POST /hf/v1/chat/completions
    
  • Embeddings

    POST /hf/v1/embeddings
    

All endpoints forward your request to the appropriate Gemini or OpenAI backend, handling authentication, key rotation, retries, and response formatting so your client code stays unchanged.


Best Practices & Troubleshooting

Monitoring & Health Checks

  • Visit /keys_status (requires auth) to view each key’s health, usage counts, and recent errors.
  • Set up external alerts on key disable events by watching the logs or database.

Avoiding DNS Loops

  • When using a reverse proxy that re‐routes api.generativelanguage.googleapis.com, add --dns 8.8.8.8 --dns 8.8.4.4 to your Docker run command to bypass host‑level DNS overrides.

Handling Thinking Configuration Errors

  • If you see errors about invalid thinkingConfig, map that model to a budget of 0 in THINKING_BUDGET_MAP. The proxy will drop thinkingConfig entirely for that model.

Updating Model Lists

  • Gemini Balance can fetch the latest model catalog automatically. Ensure URL_NORMALIZATION_ENABLED=false to trust the auto‑discovery logic.
  • Manually filter out unwanted models via FILTERED_MODELS to reduce clutter.

Community & Contribution

We welcome bug reports, feature requests, and pull requests! To contribute:

  1. Fork the repository and create a feature branch.
  2. Write clear commit messages and include tests for new functionality.
  3. Submit a pull request against the main branch with a detailed description.

Join our Telegram group for real‑time support and discussions:
https://t.me/+soaHax5lyI0wZDVl


License & Attribution

This project is licensed under CC BY‑NC 4.0 (Attribution‑NonCommercial). You are free to use and modify the code for non‑commercial purposes, provided you attribute the original authors. Commercial redistribution or service resale is prohibited.

“I have never sold this service on any platform; if you encounter someone charging for it, that is unauthorized resale.”


Acknowledgments

  • snailyp/gemini‑balance original repository for foundational code.
  • PicGo, SM.MS, CloudFlare‑ImgBed for providing image‑hosting integrations.
  • All contributors and maintainers who help improve this project.