Batch Inference for Everyone: A Friendly Guide to openai-batch
Imagine having to summarize 100,000 e-mails or classify 500,000 product reviews.
Calling an AI model one request at a time is slow, expensive, and quickly hits rate limits.
Batch processing changes the story: you bundle every request into a single file, send it to the cloud, and let the model work through the queue while you sleep.
In the next few minutes you will meet openai-batch, a tiny Python library that turns “upload → wait → download” into three short lines of code.
The examples work with both OpenAI (GPT-4o, GPT-3.5-turbo, embeddings, etc.) and Parasail (Llama-3, Qwen, Llava, and most other Hugging Face transformers).
Why bother with batch at all?
Traditional one-by-one calls | Batch API |
---|---|
Hand-shake + auth for every request | One auth token, millions of calls |
Latency adds up | Runs in the background at full speed |
Easy to hit rate limits | Reserved throughput |
You write retry loops | Retries handled by the provider |
Pay full price | Up to 50 % cheaper |
Three words: cheaper, faster, simpler.
What exactly is openai-batch?
openai-batch is a lightweight wrapper around the official OpenAI Batch API.
It removes the boiler-plate steps:
-
Create a .jsonl
input file -
Upload it with the Files API -
Start a batch job -
Poll the status -
Download the results
The library also knows how to talk to Parasail, so you can run the same code with open-source models hosted on Hugging Face.
Quick start (five minutes)
1. Install
pip install openai-batch
Set one environment variable:
# For OpenAI
export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxx"
# For Parasail
export PARASAIL_API_KEY="psl_xxxxxxxxxxxxxxxx"
2. Run 100 prompts in one go
The snippet below:
-
builds 100 random “tell me a joke” prompts -
uploads the file to Parasail -
waits until the job finishes -
downloads the answers to a local file
import random
from openai_batch import Batch
objects = ["cat", "robot", "coffee mug", "spaceship", "banana"]
with Batch() as batch:
for _ in range(100):
batch.add_to_batch(
model="meta-llama/Meta-Llama-3-8B-Instruct",
temperature=0.7,
max_completion_tokens=1000,
messages=[{"role": "user",
"content": f"Tell me a joke about a {random.choice(objects)}"}]
)
result, output_path, error_path = batch.submit_wait_download()
print("Status:", result.status)
print("Output file:", output_path)
print("Error file:", error_path or "none")
When the script ends you will see two new files:
-
batch_output.jsonl
– every model reply, in the same order as the requests -
batch_errors.jsonl
– any malformed or failed requests, easy to re-run later
Step-by-step control (when you want it)
Sometimes you need to:
-
submit during the day and download at night -
show progress in your own dashboard -
restart a job after a laptop reboot
Split the flow into four clear steps:
from openai_batch import Batch
import time
# 1. Decide where the files will live
batch = Batch(
submission_input_file="my_input.jsonl",
output_file="my_output.jsonl",
error_file="my_errors.jsonl"
)
# 2. Add requests
for obj in ["cat", "robot", "coffee mug", "spaceship", "banana"]:
batch.add_to_batch(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Introduce a {obj} in three sentences"}]
)
# 3. Submit and remember the id
batch_id = batch.submit()
print("Batch id:", batch_id)
# 4. Poll until done
while True:
status = batch.status()
print("Current status:", status.status)
if status.status in ["completed", "failed", "expired", "cancelled"]:
break
time.sleep(60)
# 5. Download
output_path, error_path = batch.download()
print("Results saved to:", output_path)
Store batch_id
somewhere safe.
If you reboot, just reconnect:
batch = Batch(batch_id="batch_abc123")
# continue polling or download immediately
Two families of models, one library
The library chooses the provider for you:
# OpenAI model → OpenAI provider
Batch().add_to_batch(model="gpt-4o-mini", messages=[...])
# Any other model → Parasail provider
Batch().add_to_batch(model="meta-llama/Meta-Llama-3-8B-Instruct", messages=[...])
Override manually when you wish:
from openai_batch.providers import get_provider_by_name
provider = get_provider_by_name("parasail")
batch = Batch(provider=provider)
Practical recipes
1. Text embeddings at scale
Convert thousands of documents to vectors for search or clustering:
with Batch() as batch:
for doc in ["The quick brown fox jumps over the lazy dog",
"Machine learning models can process natural language"]:
batch.add_to_batch(
model="text-embedding-3-small",
input=doc
)
_, output_path, _ = batch.submit_wait_download()
2. Resume an interrupted job
Example: you started a job on your laptop, closed the lid, and went home.
from openai_batch import Batch
import time
batch = Batch(batch_id="batch_def456")
while True:
status = batch.status()
print("Status:", status.status)
if status.status == "completed":
output_path, error_path = batch.download()
print("Done! Saved to", output_path)
break
elif status.status in ["failed", "expired", "cancelled"]:
print("Batch ended with:", status.status)
break
time.sleep(60)
Command-line tools (no Python needed)
The package ships with two utilities:
-
Generate example prompts
python -m openai_batch.example_prompts
Prints JSONL to stdout.
-
Run a batch from a file
python -m openai_batch.run input.jsonl
Handy flags:
-
-c
create the batch but do not wait -
--resume <BATCH_ID>
pick up an existing job -
--dry-run
validate the file without uploading -
--help
shows the rest
OpenAI walk-through
export OPENAI_API_KEY="sk-..."
# Create an example file
python -m openai_batch.example_prompts | \
python -m openai_batch.create_batch --model gpt-4o-mini > input.jsonl
# Submit and wait
python -m openai_batch.run input.jsonl
Parasail walk-through
export PARASAIL_API_KEY="psl_..."
python -m openai_batch.example_prompts | \
python -m openai_batch.create_batch --model meta-llama/Meta-Llama-3-8B-Instruct > input.jsonl
python -m openai_batch.run -p parasail input.jsonl
Frequently asked questions
Q1: What format must the input file be?
A: JSON Lines (.jsonl
), one JSON object per line. The library builds this file for you; you never have to write it by hand.
Q2: How big can a batch be?
A: OpenAI allows up to 2 GB or 10 million tokens per file. Parasail has similar limits. In practice you can safely queue hundreds of thousands of short prompts.
Q3: Is batch cheaper?
A: Yes. OpenAI and Parasail both offer discounted pricing for batch jobs because they run at lower priority. Check the latest pricing page for exact numbers.
Q4: What if some requests fail?
A: Failed requests are written to *_errors.jsonl
. Fix the lines and start a new batch with them; the rest of the original job is unaffected.
Key take-aways
-
Batch processing turns a mountain of requests into a single file upload. -
openai-batch wraps the official API in a handful of Python calls. -
Works with OpenAI and Parasail out of the box. -
Resume jobs, inspect errors, or run entirely from the command line.
Next time you need to label a million images or summarize a decade of support tickets, skip the loops and let the cloud do the heavy lifting.
Happy batching!