KiroGate: The Open Source API Gateway That Lets You Call Claude Models from Any Tool – A Complete Guide

Snippet: KiroGate is an open‑source proxy gateway that aggregates multiple Kiro IDE accounts and exposes both OpenAI‑compatible and Anthropic‑compatible APIs. It enables you to use Claude models through any tool that supports these standard interfaces, while providing intelligent load balancing, automatic failover, context compression, and a built‑in management dashboard – all with zero external dependencies.

Have you ever wanted to integrate Claude’s powerful language models into your project, only to be stopped by regional restrictions, quota limits on a single account, or the hassle of rewriting code to fit Anthropic’s API when your team is already used to OpenAI’s SDK?

As a developer who regularly works with large language models, I faced these exact problems – until I discovered KiroGate. It’s a lightweight, open‑source gateway that sits between your application and Kiro IDE (the platform that gives you access to Claude models). KiroGate pools multiple Kiro accounts, automatically distributes requests, and speaks both OpenAI and Anthropic “dialects.”

In this guide, I’ll walk you through everything KiroGate can do, how to set it up in under ten minutes, and the clever engineering behind its smart scheduling, context compression, and self‑healing mechanisms. All information here comes directly from the project’s official documentation – no guesswork, no external fluff.

What Exactly Is KiroGate?

KiroGate is an API proxy gateway designed to make Claude models accessible through any tool that understands OpenAI’s /v1/chat/completions or Anthropic’s /v1/messages format.

Under the hood, it translates those standard requests into the format expected by Kiro IDE’s API, then forwards them to one of many Kiro accounts you’ve added to its pool. This means:

You don’t have to manage multiple accounts manually.
You can keep using your favourite SDKs (OpenAI Python, Anthropic Node.js, cURL, Postman, etc.).
If one account hits a rate limit or goes down, KiroGate automatically switches to another – your application barely notices a hiccup.

KiroGate is built on two earlier projects: the routing and translation core from kiro‑openai‑gateway and the multi‑account orchestration from kiro‑account‑manager. The entire codebase is written in Deno and uses Deno’s built‑in KV store for persistence – no external database or Redis required.

Core Features That Make KiroGate a Game‑Changer

1. Dual‑API Compatibility: OpenAI + Anthropic in One Place

KiroGate exposes two industry‑standard endpoints:

/v1/chat/completions – fully compatible with OpenAI’s Chat Completion API.
/v1/messages – fully compatible with Anthropic’s Messages API.

This means you can point both your old OpenAI‑based scripts and your new Anthropic‑based experiments to the same gateway. Your frontend team can keep using the OpenAI SDK they love, while your data science team uses the Anthropic SDK for its unique features – all without changing a single line of code on either side.

2. Intelligent Multi‑Account Scheduling

The account pool is KiroGate’s heart. You can add any number of Kiro Refresh Tokens, and the gateway will maintain the following per‑account metrics in real time:

Health score (0–100): dynamically updated based on request success rate, error types, and response times. Higher‑scoring accounts are preferred.
Concurrent request count: the number of active requests using that account.
Quota tracking: automatically records token usage and request counts (when the account supports it).

The scheduler supports three modes:

Mode	Behaviour
Smart (default)	Picks the account with the highest `healthScore / (concurrent + 1)` – balances health and load.
Priority	Uses accounts strictly in the order you set; falls back only when higher‑priority accounts are exhausted.
Balanced	Distributes requests evenly across all healthy accounts.

If all accounts become unavailable, KiroGate enters a self‑healing mode: it periodically probes the cooled‑down accounts until at least one recovers.

3. Three Authentication Methods for Every Use Case

KiroGate offers three ways to authenticate incoming requests, so you can adapt it to personal use, team sharing, or even a multi‑tenant SaaS setup.

Simple mode – You set a global PROXY_API_KEY. Clients present this key, and the gateway automatically picks an account from the pool. Ideal for personal projects or a trusted team.
```
Authorization: Bearer your-proxy-key
```
Composite mode – Users provide their own Refresh Token in the format PROXY_API_KEY:REFRESH_TOKEN. The gateway first validates the global key (ensuring they’re allowed to use the service), then uses the user’s personal token to talk to Kiro. This gives you per‑user isolation without needing to pre‑configure each account.
```
Authorization: Bearer global-key:user-refresh-token
```
Managed API Key mode – From the admin panel you can create keys with a kg- prefix, each with its own quota limits, model whitelist, and expiration date. These keys are perfect for distributing to external partners or different microservices, and you can track usage per key in real time.

4. Context Compression: Tame Those Long Conversations

When a conversation grows longer than the model’s context window, KiroGate’s built‑in compressor automatically kicks in. Here’s how it works:

It keeps the most recent N messages untouched (you can configure N, default is 20) to preserve conversational flow.
Older messages are split into batches and sent to the fast, low‑cost Claude Haiku model to generate a one‑sentence summary of each batch.
Those summaries replace the original historical messages in the final request to Claude.
Three‑layer caching speeds everything up:
- Incremental memory cache: For the same conversation ID, only new messages are summarised on subsequent requests.
- LRU memory cache: Recently generated summaries are kept in memory (least‑recently‑used eviction, up to 1000 conversations).
- Deno KV persistent cache: After a restart, the most‑used summaries are reloaded from disk, avoiding re‑computation.

This mechanism can drastically reduce token consumption for long threads while preserving the core meaning.

5. Circuit Breaker + Rate Limiting: Protect Your Backend

To prevent a sudden flood of requests from overwhelming a single account, KiroGate implements the classic circuit breaker pattern:

CLOSED – Requests flow normally.
OPEN – After a configurable number of consecutive failures (default: 5), the circuit opens and all requests are immediately rejected for a cool‑down period (default: 30 seconds). This gives the account time to recover.
HALF_OPEN – After the cool‑down, a single test request is allowed. If it succeeds, the circuit closes; if it fails, it goes back to OPEN.

In addition, a token bucket rate limiter can be enabled globally via the RATE_LIMIT_PER_MINUTE environment variable. Requests that exceed the limit receive a 429 Too Many Requests response.

6. Built‑in Management Dashboard

KiroGate comes with a web‑based admin interface (accessible at /admin/accounts and /admin/keys) protected by the ADMIN_PASSWORD you set. From here you can:

Add, delete, or temporarily disable Kiro accounts.
Manually refresh an account’s Access Token.
View each account’s health score, recent errors, and remaining quota.
Create and manage managed API keys (with quotas, model restrictions, and expiry).
Monitor real‑time global statistics: total requests, success/failure rates, average latency.

There’s also a public dashboard at /dashboard that shows a simplified health status without requiring a password.

7. Zero External Dependencies

KiroGate is a single Deno application with no need for Redis, PostgreSQL, or any other service. All data – account tokens, API keys, caches – is stored in Deno’s built‑in KV store (using --unstable-kv), which persists to the local file system. This makes deployment trivial: just copy the files and run Deno.

Quick Start: Get Your First Request in 10 Minutes

Prerequisites

Install Deno 2.x or later.
Obtain at least one Kiro IDE Refresh Token (you can extract it from the Kiro client or after logging into the web interface).

Step 1: Download and Run KiroGate

git clone https://github.com/your-repo/kirogate.git   # replace with the actual repository URL
cd kirogate

# Set environment variables (change these to your own secrets)
export PROXY_API_KEY="my-super-secret-key"
export ADMIN_PASSWORD="admin123"

# Start the service
deno run --allow-net --allow-env --unstable-kv main.ts

If everything works, you’ll see a log message like Listening on http://0.0.0.0:8000.

Step 2: Add Your First Account

Open your browser and go to http://localhost:8000/admin/accounts. Enter the admin password (admin123) to log in. Click Add Account, paste your Refresh Token, and KiroGate will automatically exchange it for an Access Token. You can repeat this to add multiple accounts – they will all be pooled together.

Step 3: Send a Test Request

Use curl to make an OpenAI‑style request:

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer my-super-secret-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [{"role": "user", "content": "Hello, please introduce yourself briefly."}],
    "stream": true
  }'

You should see Claude’s reply streamed back token by token. Congratulations – you’ve just used KiroGate!

Configuration: All Environment Variables Explained

KiroGate is configured through environment variables. You can set them directly in your shell, in a .env file, or via your container orchestration tool.

Variable	Default	Description
`PROXY_API_KEY`	`changeme_proxy_secret`	Global API key for simple and composite authentication modes.
`ADMIN_PASSWORD`	`admin`	Password for accessing the admin panels at `/admin/accounts` and `/admin/keys`.
`PORT`	`8000`	Port on which the HTTP server listens.
`LOG_LEVEL`	`INFO`	Logging verbosity: `DEBUG`, `INFO`, `WARN`, `ERROR`.
`RATE_LIMIT_PER_MINUTE`	`0`	Global request rate limit (requests per minute). `0` means no limit.
`ENABLE_COMPRESSION`	`true`	Whether to enable automatic context compression.

Authentication Modes in Depth

Simple Mode – Personal or Team Use

Set PROXY_API_KEY to a secret value. All clients must include this key in the Authorization header. The gateway picks a healthy account from the pool for each request.
Best for: Solo developers, internal teams where everyone is trusted.

Composite Mode – Bring‑Your‑Own‑Account

Clients provide a key in the format global-key:user-refresh-token. The gateway first checks that the global key matches PROXY_API_KEY (ensuring the client is authorised to use the service). Then it uses the user’s own Refresh Token to communicate with Kiro. This way, each user’s quota is separate, and you don’t need to pre‑register their accounts.
Best for: SaaS platforms, open APIs where users want to use their own credentials.

Managed API Key Mode – Fine‑Grained Access Control

From the admin panel you can create keys starting with kg-. When creating a key you can set:

Quota limits: by token count or number of requests.
Model whitelist: restrict the key to specific models (e.g., only claude-haiku-4-5).
IP whitelist: limit usage to certain CIDR ranges.
Expiration date: the key becomes invalid after a certain date.

Each key’s usage is tracked, so you can see exactly how many tokens each consumer has used.
Best for: External partners, microservices, or any scenario requiring fine‑grained access control.

API Reference

Proxy Endpoints (require API key)

Method	Path	Description
`GET`	`/v1/models`	List available models from the account pool.
`POST`	`/v1/chat/completions`	OpenAI‑style chat completion.
`POST`	`/v1/messages`	Anthropic‑style messages API.
`GET`	`/health`	Health check endpoint.

Admin Endpoints (require admin password)

Method	Path	Description
`GET`	`/api/accounts`	List all accounts.
`POST`	`/api/accounts`	Add a new account (provide Refresh Token).
`PUT`	`/api/accounts/:id`	Update account (priority, disabled status).
`DELETE`	`/api/accounts/:id`	Delete an account.
`POST`	`/api/accounts/:id/refresh`	Manually refresh an account’s Access Token.
`GET`	`/api/keys`	List all managed API keys.
`POST`	`/api/keys`	Create a new managed API key.
`PUT`	`/api/keys/:id`	Update a key (quota, whitelist).
`DELETE`	`/api/keys/:id`	Delete a key.
`GET`	`/api/proxy/status`	Current proxy status (no auth required).
`GET`	`/api/proxy/health`	Health report (no auth required).
`GET`	`/api/proxy/stats`	Detailed statistics (requests, success rate, latency).
`GET`	`/api/proxy/logs`	Recent request logs.
`PUT`	`/api/proxy/config`	Update runtime configuration (e.g., rate limit).
`GET/PUT`	`/api/settings`	Get or update global settings.

Front‑End Pages

Path	Purpose
`/`	Landing page.
`/docs`	API documentation.
`/swagger`	Swagger UI interactive docs.
`/playground`	Online test tool.
`/deploy`	Deployment guide.
`/dashboard`	Public monitoring dashboard.
`/admin/accounts`	Account management.
`/admin/keys`	API key management.

Supported Models

You can use any of these model names in the model field of your requests (provided your Kiro accounts have access to them):

claude-opus-4-5
claude-sonnet-4-5
claude-sonnet-4
claude-haiku-4-5
claude-3-7-sonnet-20250219

SDK Examples

Python with OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="my-super-secret-key"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms."}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Python with Anthropic SDK

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8000",
    api_key="my-super-secret-key"
)

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(message.content[0].text)

Node.js with OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8000/v1",
  apiKey: "my-super-secret-key",
});

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [{ role: "user", content: "Tell me a joke." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Deployment Options

Docker

A Dockerfile is included. Build and run:

docker build -t kirogate .
docker run -d -p 8000:8000 \
  -e PROXY_API_KEY="your-key" \
  -e ADMIN_PASSWORD="admin123" \
  kirogate

Docker Compose

Use the provided docker-compose.yml:

version: "3"
services:
  kirogate:
    build: .
    ports:
      - "8000:8000"
    environment:
      - PROXY_API_KEY=your-key
      - ADMIN_PASSWORD=admin123
    restart: unless-stopped

Then docker-compose up -d.

Deno Deploy

KiroGate can also run on Deno Deploy with minor adjustments. The basic command:

deno install -A jsr:@deno/deployctl
deployctl deploy --project=your-project main.ts

Architecture Deep Dive

How Multi‑Account Scheduling Really Works

Each account in the pool is represented in memory with these fields:

healthScore: 0–100, starting at 100.
concurrent: number of active requests using this account.
lastError: the type of the last error (timeout, auth failure, rate limit, etc.).
cooldownUntil: timestamp until which the account is temporarily disabled (if healthScore reached 0).

When a new request arrives, the scheduler:

Filters out accounts with healthScore ≤ 0 or still in cool‑down.
Applies the selected scheduling mode:
- Priority: picks the first available account by priority.
- Balanced: picks a random account.
- Smart: computes healthScore / (concurrent + 1) for each account and selects the highest. This favours healthy accounts while avoiding overloading any single one.

After a request completes:

Success: healthScore increases by a small amount (capped at 100).
Failure: healthScore decreases based on error severity (e.g., auth failure: -50, timeout: -20). If it reaches 0, the account enters cool‑down. The cool‑down duration grows exponentially with consecutive failures.

Context Compression in Detail

The compressor triggers when the estimated token count of the messages array exceeds a threshold (configurable, e.g., 100k tokens). Steps:

Keep the last K messages intact (K default 20).
Split the remaining older messages into batches, each batch under a token limit (e.g., 20k tokens).
For each batch, call Claude Haiku with a prompt like “Summarise the core content of the above conversation in one sentence.”
Replace the original older messages with the generated summaries in the final request to the target model.
Store the generated summary along with a hash of the original message range in the cache.

Cache layers:

Incremental memory cache: keyed by conversation ID; only newly added messages are summarised on subsequent requests.
LRU memory cache: holds up to 1000 recent summaries, evicting least‑recently used.
Deno KV persistent cache: loads the most used summaries after a restart.

Circuit Breaker State Machine

The circuit breaker is implemented per account (or can be global). It maintains three states:

CLOSED: normal operation; counts consecutive failures. When failures reach failureThreshold (default 5), it transitions to OPEN.
OPEN: all requests are immediately rejected with HTTP 503. A timer for timeout seconds (default 30) is started. After timeout, it moves to HALF_OPEN.
HALF_OPEN: allows one test request. If that succeeds, it resets to CLOSED and clears the failure count. If it fails, it returns to OPEN and restarts the timer.

The circuit breaker and health score work together: when the circuit is OPEN, the account’s health score is forced to 0, so the scheduler won’t select it.

Tips for Using the Management Panel

Account Management

In /admin/accounts you’ll see each account’s Refresh Token (partially masked), Access Token expiry, health score, current concurrency, and total request count. You can manually refresh a token if it’s about to expire, or temporarily disable a troublesome account.

Managed API Keys

When creating a key, you can set:

Quota limit: e.g., 1 million tokens or 10,000 requests.
Model whitelist: restrict to specific models.
IP whitelist: comma‑separated CIDR ranges.
Expiry date: pick a date when the key should automatically become invalid.

All keys are listed with their current usage counters, so you can monitor who is using what.

Real‑Time Statistics

The public dashboard at /dashboard (no login required) displays:

Requests per minute over the last 5 minutes (line chart).
Model usage breakdown (pie chart).
Success vs. failure ratio.
Average response time.
Overall health of the account pool (green/yellow/red).

Frequently Asked Questions

Q: Does KiroGate need a database?
A: No. All data is stored in Deno’s built‑in KV store, which persists to the local file system.

Q: How do I load‑balance across multiple KiroGate instances?
A: The account pool already balances requests across your Kiro accounts. If you need to scale horizontally, you can run multiple KiroGate instances behind a reverse proxy like Nginx – they will each have their own KV store, but you can share account data by pointing them to the same persistent volume.

Q: Which Claude models are supported?
A: Any model available through Kiro IDE, including Opus, Sonnet, and Haiku in their latest versions. The exact list depends on what your accounts have access to.

Q: Does context compression affect answer quality?
A: The most recent messages (the last 20 by default) are kept untouched, so the immediate context is preserved. The older messages are summarised by Haiku, which is highly capable of condensing information without losing key points. In practice, quality impact is minimal.

Q: What happens when the circuit breaker opens?
A: The gateway returns HTTP 503 Service Unavailable, with the header X-Kiro-Circuit-Open: true. The client can retry later.

Q: Can I change the rate limit without restarting?
A: Yes, use the PUT /api/proxy/config endpoint to update RATE_LIMIT_PER_MINUTE dynamically.

Conclusion

KiroGate takes the pain out of managing multiple Kiro accounts and makes Claude models accessible through the tools you already use. Whether you’re a solo developer wanting to aggregate your own accounts, a team that needs to share a pool, or a platform provider who wants to offer Claude to your users with fine‑grained control, KiroGate gives you a production‑ready solution with intelligent failover, self‑healing, and a clean, extensible codebase.

Because it’s open source and built on Deno, you can deploy it anywhere – from a Raspberry Pi to a cloud VM – without worrying about dependency hell. And with features like context compression and circuit breakers, it’s designed to keep your applications running smoothly even when things go wrong.

If you’re already using Claude models or planning to, give KiroGate a try. It might just become an indispensable part of your AI infrastructure.

This article is based entirely on the official KiroGate README. All technical details, configuration options, and code examples have been verified against the source documentation.

KiroGate API Gateway: Unlock Claude AI for Any Tool with Multi-Account Load Balancing