Track Every Penny You Spend on AI — A Plain-English Guide to Fiorino.AI

Developer desk with coffee and code

Running a SaaS that uses large-language models (LLMs) feels a bit like owning a sports car: the acceleration is thrilling, but the fuel bill can arrive as an unpleasant surprise. One month you burn 2,000, and nobody on the team can tell you exactly which customer or feature caused the jump.

Fiorino.AI is an open-source cost-tracking and billing helper designed for this exact headache. It sits quietly between your app and the LLM provider, counts every token, attaches it to an anonymous user ID, and shows the result in a single dashboard. When you are ready, it can also pipe the same numbers into Stripe and generate an invoice automatically.

Below you will find:

  • 🍂
    What the tool can (and cannot) do today
  • 🍂
    A ten-minute local setup that actually works
  • 🍂
    A no-fluff roadmap so you can decide whether to bet on it long-term

All details come straight from the official README and ROADMAP files—nothing added, nothing removed.


Who Needs an “AI Ledger”?

Imagine you offer an AI writing assistant. Three teams use your API:

  • 🍂
    Team A (free tier) runs 100 short blog posts per day
  • 🍂
    Team B (paid tier) uploads 5,000-word white papers every night
  • 🍂
    Team C (enterprise) just started feeding in entire e-books

Your single OpenAI key shows one total, but the cost drivers are wildly different. Without granular data you cannot:

  • 🍂
    Set fair usage limits
  • 🍂
    Detect abuse early
  • 🍂
    Price each tier profitably

Fiorino.AI turns that single scary number into individual rows you can read like a bank statement.


Core Features at a Glance

Area Today (v2024-11) Next Releases
Cost Tracking ✅ per-user, per-model, per-token ✅ custom mark-ups
Billing ❌ manual export ✅ native Stripe (Q1 2025)
Limits & Alerts ✅ quotas + e-mail (Q1 2025)
Insights ✅ tagging + custom reports (Q2 2025)
AI Analyst Chat ✅ natural-language queries (Q2 2025)
Audit Trail ✅ full logs (Future)

All green check-marks are already in the container you can download today.


Ten-Minute Quick-Start

Everything runs in Docker. No Python, Node, or database skills required.

Step 1 – Start a Database

If you already have PostgreSQL, skip this. Otherwise:

docker run -d --name fiorino-db \
  -e POSTGRES_USER=fiorino \
  -e POSTGRES_PASSWORD=fiorino \
  -e POSTGRES_DB=fiorino \
  -p 5432:5432 \
  postgres:15

Step 2 – Launch Fiorino.AI

docker run --rm -it \
  -p 8000:8000 \
  -e DATABASE_URL="postgresql://fiorino:fiorino@host.docker.internal:5432/fiorino" \
  ghcr.io/fiorino-ai/fiorino-ai:latest

Wait for the line Uvicorn running on http://0.0.0.0:8000.

Step 3 – Open the Dashboard

  • 🍂
    Dashboard: http://localhost:8000/app
  • 🍂
    Interactive docs: http://localhost:8000/docs

The first visit will show an empty page—time to feed in some data.

Step 4 – Log Your First Call

curl -X POST http://localhost:8000/api/v1/track \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "user_001",
    "model": "gpt-3.5-turbo",
    "prompt_tokens": 120,
    "completion_tokens": 80,
    "realm": "prod"
  }'

Refresh the dashboard and you will see:

  • 🍂
    One event from user_001
  • 🍂
    200 tokens total
  • 🍂
    Roughly 0.002 per 1 k tokens)
Dashboard preview

How the Pieces Fit Together

  1. Message Processing
    Your backend sends a JSON snippet after every LLM call. No SDK is strictly required—just HTTP POST.

  2. Cost Engine
    The tool looks up the price for the declared model, multiplies by your custom margin, and stores the final cost.

  3. Storage Layer
    All records live in PostgreSQL. You can query with plain SQL or export CSV anytime.

  4. Billing Bridge (coming)
    In Q1 2025 an optional worker will read the same table and create Stripe invoices or subscription items automatically.


Pricing Engine Deep Dive

You are free to set:

  • 🍂
    Base price – whatever the provider charges today
  • 🍂
    Overhead – e.g., add 30 % to cover infra and support
  • 🍂
    Currency – USD by default, but the schema is currency-agnostic

Changes apply only to new calls; historical data remain immutable for accurate bookkeeping.


Privacy-First Design

The system never asks for e-mail, name, or any PII. You pass an opaque user_id (UUID, hash, or database surrogate key). Internally everything is keyed by that token, so GDPR/CCPA compliance is straightforward: delete the ID, delete the record.


Scaling Out

  • 🍂
    Horizontal: stateless Go + FastAPI containers; add more pods behind a load balancer.
  • 🍂
    Database: plain PostgreSQL; RDS, Cloud SQL, or self-hosted all work.
  • 🍂
    Multi-realm: run separate containers for staging vs. prod; each realm has its own schema.

Common Questions

Q: Do I have to modify my LLM calls?
A: No. Just fire a second HTTP request after the provider responds. One extra line of code.

Q: What if I use multiple providers?
A: Each call specifies the model string (gpt-4, claude-v1, etc.). You can add new rows to the price table on the fly.

Q: Is there a hosted version?
A: Not yet. The project is fully open-source, so you self-host today. Enterprise plans may appear later.


2025 Roadmap Snapshot

Quarter Focus Features
Q4 2024 SDKs Python & TypeScript clients, async support, PyPI & npm
Q1 2025 Billing Stripe integration, custom tiers, usage-based pricing
Q1 2025 Limits Per-user quotas, e-mail alerts, grace periods
Q2 2025 Insights Tagging (feature, campaign, etc.), custom dashboards
Q2 2025 AI Analyst Chat interface—“Which customer spiked yesterday?”
Future Compliance Audit logs, RBAC, GDPR tooling

Dates are targets, not guarantees. The list is frozen at November 2024.


Mini Case Study (Taken from README)

A resume-optimization SaaS saw its OpenAI bill jump from 2,000 in 30 days. After plugging in Fiorino.AI they discovered:

  • 🍂
    5 % of free-trial users consumed 40 % of tokens
  • 🍂
    Two users uploaded entire novels for “proofreading”

They added a 5 k token daily cap for free users; the next month’s bill dropped to $800 and paid conversions rose 12 %. No further engineering work required.


Contributing (One Minute)

  1. Fork the repository.
  2. Check the CONTRIBUTING.md guide.
  3. Open an issue before large pull requests.
  4. Submit code, docs, or tests—every little bit helps.

Takeaway

If the only number you currently see is the provider’s monthly PDF, you are flying blind. Fiorino.AI gives you the granularity of a phone bill: who, when, how much, and why. It is lightweight enough to test on your laptop today and ambitious enough to grow into your full billing stack tomorrow.

Grab Docker, spin it up, and enjoy the calm feeling of knowing exactly where every AI penny went.

Roadmap sticky notes