Batch Inference for Everyone: A Friendly Guide to openai-batch

Imagine having to summarize 100,000 e-mails or classify 500,000 product reviews.
Calling an AI model one request at a time is slow, expensive, and quickly hits rate limits.
Batch processing changes the story: you bundle every request into a single file, send it to the cloud, and let the model work through the queue while you sleep.
In the next few minutes you will meet openai-batch, a tiny Python library that turns “upload → wait → download” into three short lines of code.
The examples work with both OpenAI (GPT-4o, GPT-3.5-turbo, embeddings, etc.) and Parasail (Llama-3, Qwen, Llava, and most other Hugging Face transformers).


A neat pile of documents ready for batch processing

Why bother with batch at all?

Traditional one-by-one calls Batch API
Hand-shake + auth for every request One auth token, millions of calls
Latency adds up Runs in the background at full speed
Easy to hit rate limits Reserved throughput
You write retry loops Retries handled by the provider
Pay full price Up to 50 % cheaper

Three words: cheaper, faster, simpler.


What exactly is openai-batch?

openai-batch is a lightweight wrapper around the official OpenAI Batch API.
It removes the boiler-plate steps:

  1. Create a .jsonl input file
  2. Upload it with the Files API
  3. Start a batch job
  4. Poll the status
  5. Download the results

The library also knows how to talk to Parasail, so you can run the same code with open-source models hosted on Hugging Face.


Quick start (five minutes)

1. Install

pip install openai-batch

Set one environment variable:

# For OpenAI
export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxx"

# For Parasail
export PARASAIL_API_KEY="psl_xxxxxxxxxxxxxxxx"

2. Run 100 prompts in one go

The snippet below:

  • builds 100 random “tell me a joke” prompts
  • uploads the file to Parasail
  • waits until the job finishes
  • downloads the answers to a local file
import random
from openai_batch import Batch

objects = ["cat", "robot", "coffee mug", "spaceship", "banana"]

with Batch() as batch:
    for _ in range(100):
        batch.add_to_batch(
            model="meta-llama/Meta-Llama-3-8B-Instruct",
            temperature=0.7,
            max_completion_tokens=1000,
            messages=[{"role": "user",
                       "content": f"Tell me a joke about a {random.choice(objects)}"}]
        )

    result, output_path, error_path = batch.submit_wait_download()

print("Status:", result.status)
print("Output file:", output_path)
print("Error file:", error_path or "none")

When the script ends you will see two new files:

  • batch_output.jsonl – every model reply, in the same order as the requests
  • batch_errors.jsonl – any malformed or failed requests, easy to re-run later

Step-by-step control (when you want it)

Sometimes you need to:

  • submit during the day and download at night
  • show progress in your own dashboard
  • restart a job after a laptop reboot

Split the flow into four clear steps:

from openai_batch import Batch
import time

# 1. Decide where the files will live
batch = Batch(
    submission_input_file="my_input.jsonl",
    output_file="my_output.jsonl",
    error_file="my_errors.jsonl"
)

# 2. Add requests
for obj in ["cat", "robot", "coffee mug", "spaceship", "banana"]:
    batch.add_to_batch(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Introduce a {obj} in three sentences"}]
    )

# 3. Submit and remember the id
batch_id = batch.submit()
print("Batch id:", batch_id)

# 4. Poll until done
while True:
    status = batch.status()
    print("Current status:", status.status)
    if status.status in ["completed", "failed", "expired", "cancelled"]:
        break
    time.sleep(60)

# 5. Download
output_path, error_path = batch.download()
print("Results saved to:", output_path)

Store batch_id somewhere safe.
If you reboot, just reconnect:

batch = Batch(batch_id="batch_abc123")
# continue polling or download immediately

Two families of models, one library

The library chooses the provider for you:

# OpenAI model → OpenAI provider
Batch().add_to_batch(model="gpt-4o-mini", messages=[...])

# Any other model → Parasail provider
Batch().add_to_batch(model="meta-llama/Meta-Llama-3-8B-Instruct", messages=[...])

Override manually when you wish:

from openai_batch.providers import get_provider_by_name

provider = get_provider_by_name("parasail")
batch = Batch(provider=provider)

Practical recipes

1. Text embeddings at scale

Convert thousands of documents to vectors for search or clustering:

with Batch() as batch:
    for doc in ["The quick brown fox jumps over the lazy dog",
                "Machine learning models can process natural language"]:
        batch.add_to_batch(
            model="text-embedding-3-small",
            input=doc
        )
    _, output_path, _ = batch.submit_wait_download()

2. Resume an interrupted job

Example: you started a job on your laptop, closed the lid, and went home.

from openai_batch import Batch
import time

batch = Batch(batch_id="batch_def456")

while True:
    status = batch.status()
    print("Status:", status.status)

    if status.status == "completed":
        output_path, error_path = batch.download()
        print("Done! Saved to", output_path)
        break
    elif status.status in ["failed", "expired", "cancelled"]:
        print("Batch ended with:", status.status)
        break

    time.sleep(60)

Command-line tools (no Python needed)

The package ships with two utilities:

  1. Generate example prompts

    python -m openai_batch.example_prompts
    

    Prints JSONL to stdout.

  2. Run a batch from a file

    python -m openai_batch.run input.jsonl
    

Handy flags:

  • -c create the batch but do not wait
  • --resume <BATCH_ID> pick up an existing job
  • --dry-run validate the file without uploading
  • --help shows the rest

OpenAI walk-through

export OPENAI_API_KEY="sk-..."

# Create an example file
python -m openai_batch.example_prompts | \
  python -m openai_batch.create_batch --model gpt-4o-mini > input.jsonl

# Submit and wait
python -m openai_batch.run input.jsonl

Parasail walk-through

export PARASAIL_API_KEY="psl_..."

python -m openai_batch.example_prompts | \
  python -m openai_batch.create_batch --model meta-llama/Meta-Llama-3-8B-Instruct > input.jsonl

python -m openai_batch.run -p parasail input.jsonl
Terminal session

Frequently asked questions

Q1: What format must the input file be?
A: JSON Lines (.jsonl), one JSON object per line. The library builds this file for you; you never have to write it by hand.

Q2: How big can a batch be?
A: OpenAI allows up to 2 GB or 10 million tokens per file. Parasail has similar limits. In practice you can safely queue hundreds of thousands of short prompts.

Q3: Is batch cheaper?
A: Yes. OpenAI and Parasail both offer discounted pricing for batch jobs because they run at lower priority. Check the latest pricing page for exact numbers.

Q4: What if some requests fail?
A: Failed requests are written to *_errors.jsonl. Fix the lines and start a new batch with them; the rest of the original job is unaffected.


Key take-aways

  • Batch processing turns a mountain of requests into a single file upload.
  • openai-batch wraps the official API in a handful of Python calls.
  • Works with OpenAI and Parasail out of the box.
  • Resume jobs, inspect errors, or run entirely from the command line.

Next time you need to label a million images or summarize a decade of support tickets, skip the loops and let the cloud do the heavy lifting.

Happy batching!

A calm sunset after a long batch job