Latitude LLM: How This Open-Source AI Engineering Platform Builds a Reliability Loop Through Observability

高效码农

2 months ago

Summary: Latitude is an open-source AI engineering platform that prioritizes observability and evaluations by capturing production LLM traffic—including prompts, inputs/outputs, tool calls, latency, token usage, and costs—to build an eval-driven reliability loop. It supports gradual adoption: start with telemetry and prompt playground, then add datasets, evaluations, experiments, and optimization. The monorepo uses TypeScript, Drizzle ORM, BullMQ, Docker Compose, and PromptL chains, with clear patterns for services, repositories, migrations, and background jobs. This guide details contribution philosophy, build commands, database practices, ClickHouse migrations, prompt running architecture, and more for developers with college-level or higher education.

Latitude LLM: Architecture and Development Practices for Open-Source AI Engineering Platform

Are you building LLM-powered features but struggling to maintain observability, run reliable evaluations, and turn production failures into repeatable improvements? Latitude, an open-source AI engineering platform, offers a structured path forward. It begins with capturing real traffic data and evolves into a full reliability loop using evaluations to continuously refine prompts. This article, drawn directly from the project’s guidelines and README, explains the platform’s philosophy, technical architecture, and practical implementation patterns to help you contribute effectively or deploy it successfully.

Why Latitude Stands Out as an Open-Source AI Engineering Platform

Latitude is designed for teams operating LLM applications in production. Unlike basic prompt tools, it emphasizes observability first—capturing prompts, inputs/outputs, tool calls, latency, token usage, and costs from live traffic—then builds an evaluation-driven reliability loop.

Key adoption stages include:

Observability Layer: Instrument existing LLM calls to record detailed execution data.
Prompt Playground: Reproduce runs with real inputs, iterate on versions, and publish via the AI Gateway.
Datasets: Curate production examples for batch testing and regression suites.
Evaluations: Use built-in rules, LLM-as-judge, and human scoring.
Experiments: Compare models, providers, and prompt versions with measurable metrics.

The reliability loop extends this with annotations, issue clustering, automatic evaluations guarding releases, and the GEPA prompt optimizer that searches variations against your evaluation suite to reduce recurring failures.

Latitude Telemetry integrates with major providers and frameworks, extendable via OTLP. This staged approach minimizes disruption while delivering long-term value.

Quick Start Guide for Latitude Cloud and Self-Hosted Deployments

Getting started with Latitude is straightforward, whether using the managed cloud or self-hosted version.

For Latitude Cloud:

Sign up at latitude.so and create a project.
Add the telemetry SDK or export OTLP traces.
Build datasets and evaluations to measure quality.
Version prompts or agents and publish through the gateway.
Optimize using evaluation results to fix failures.

For Self-Hosted:
Follow the production setup guide to install dependencies via Docker Compose, then follow the same steps as the cloud version.

After setup, the platform automatically supports foreground streaming and background queued execution for prompt runs.

Contribution Philosophy: Building for Longevity

This codebase will outlive you. Every shortcut creates future burdens, and hacks compound into technical debt. As a contributor, you shape patterns that others will copy.

Core principles:

Write TypeScript, preferring types over interfaces.
Favor functional patterns and early returns.
Use descriptive names: isLoading, hasError.
Prefix event handlers with “handle” (handleClick).
Name directories in lowercase with dashes (auth-wizard).
Avoid enums; use const maps or type unions.
Add JSDoc only for exported functions and classes.
Place exports at the top, internal methods at the bottom.
Prefer instrumentation.captureException over console.error.
Add comments only when explicitly required or for JSDoc.

These practices fight entropy and keep the codebase maintainable.

Build, Test, and Development Commands

The monorepo uses pnpm, Turborepo, and Vitest.

Essential commands:

pnpm dev — Start development servers.
pnpm lint — Lint all packages.
pnpm tc — Type-check all packages.
pnpm test — Run all tests.
pnpm test:watch — Watch mode for specific packages.
pnpm --filter @latitude-data/core db:migrate — Run database migrations.
pnpm --filter @latitude-data/core db:generate — Generate migrations.
pnpm prettier — Format code.

Avoid direct vitest or pnpm build unless specifically instructed.

Docker Production Builds and Local Testing

Production uses Docker Compose with three files:

docker-compose.yml — GHCR pre-built images.
docker-compose.local.yml — Source-based local builds.
docker-compose.prod.yml — Traefik-enabled production.

To build and test the web app locally:

docker compose -f docker-compose.local.yml up db redis weaviate -d
docker compose -f docker-compose.local.yml build web
docker compose -f docker-compose.local.yml up web

Access at http://localhost:3000.

The web Dockerfile uses node:22-alpine, Turbopack (next build --turbopack), standalone output, and multi-stage builds (Pruner → Builder → Runner) for minimal size.

Key build args:

NEXT_PUBLIC_* (client-side)
AWS_REGION, S3_BUCKET, BUILD_ID (S3 uploads)
DD_GIT_COMMIT_SHA (Datadog source maps)

Test the image:

docker run --rm --network llm_default --env-file .env -p 3000:8080 llm-web

Build other services with docker compose -f docker-compose.local.yml build gateway workers websockets.

Code Style and Overall Architecture

The project is a pnpm workspace + Turborepo monorepo.

Core logic: packages/core
UI components: packages/web-ui
Services return Result objects for error handling.
Database ops use Transaction abstraction.
Write services accept optional db (defaults to database).
Update/destroy services receive model instances.

Comprehensive Testing Patterns

Tests sit alongside source files (.test.ts). Use factories; minimize mocks in integration tests.

Run tests:

pnpm test -- "path/to/file.test.ts"
pnpm test -- "path/to/directory"

Unit test external dependencies with vi.spyOn:

import * as cacheModule from '../../cache'
import * as diskModule from '../../lib/disk'

beforeEach(() => {
  vi.spyOn(diskModule, 'diskFactory').mockReturnValue(mockDisk as any)
  vi.spyOn(cacheModule, 'cache').mockResolvedValue(mockCache as any)
})

Create mocks:

const mockDisk = {
  exists: vi.fn(),
  get: vi.fn(),
  put: vi.fn(),
  delete: vi.fn(),
}

Clear mocks in beforeEach and test both success (result.ok, result.value) and error cases.

Follow this test structure:

import { beforeEach, describe, expect, it, vi } from 'vitest'

describe('moduleName', () => {
  const mockDependency = { method: vi.fn() }

  beforeEach(() => vi.clearAllMocks())

  describe('functionName', () => {
    it('describes expected behavior', async () => {
      mockDependency.method.mockResolvedValueOnce(value)
      const result = await functionUnderTest(args)
      expect(result).toEqual(expected)
      expect(mockDependency.method).toHaveBeenCalledWith(expectedArgs)
    })
  })
})

Cover edge cases: cache misses, expirations, service failures, silent errors.

CRUD Operations: Services, Actions, Stores, and UI

Service Layer (packages/core/src/services/):
Organize by entity (e.g., apiKeys/create.ts, destroy.ts, update.ts). Use named exports, Transaction, and Result.

Action Layer (apps/web/src/actions/):
Use authProcedure + Zod. Fetch models via repositories, then call services.
Admin actions go in actions/admin/.

Example:

export const updateApiKeyAction = authProcedure
  .inputSchema(z.object({ id: z.number(), name: z.string() }))
  .action(async ({ parsedInput, ctx }) => {
    const repo = new Repository(ctx.workspace.id)
    const model = await repo.find(parsedInput.id).then((r) => r.unwrap())
    return updateService(model, { name: parsedInput.name }).then((r) => r.unwrap())
  })

Store Layer (apps/web/src/stores/):
SWR hooks with optimistic updates and toast notifications.

UI Patterns:
Modal-based editing, table action buttons with tooltips, consistent icons (edit, trash).

Database Schema and Migrations with Drizzle ORM

Schema files (packages/core/src/schema/models/):

export const tableName = latitudeSchema.table('table_name', {
  id: bigserial('id', { mode: 'number' }).notNull().primaryKey(),
  name: varchar('name', { length: 256 }).notNull(),
  workspaceId: bigint('workspace_id', { mode: 'number' })
    .notNull()
    .references(() => workspaces.id, { onDelete: 'cascade' }),
  ...timestamps(),
})

Destructive migrations require two PRs:

Remove code references and schema definitions (deploy first).
Generate and run migration to drop unused columns/tables (deploy after).

Repository pattern:
Extend RepositoryLegacy for workspace-scoped entities; implement scope getter.

ClickHouse Migrations for Analytics

Use golang-migrate with commands:

pnpm --filter @latitude-data/core ch:connect
pnpm --filter @latitude-data/core ch:status
pnpm --filter @latitude-data/core ch:create <name>
pnpm --filter @latitude-data/core ch:up
pnpm --filter @latitude-data/core ch:down [N|all]
pnpm --filter @latitude-data/core ch:reset

Maintain both unclustered/ and clustered/ folders:

Unclustered: single-node (dev/self-hosted)
Clustered: production HA

Unclustered example:

CREATE TABLE events (
  id String,
  workspace_id UInt64,
  timestamp DateTime64(3)
) ENGINE = ReplacingMergeTree()
ORDER BY (workspace_id, timestamp, id);

Clustered example:

CREATE TABLE events ON CLUSTER default (
  id String,
  workspace_id UInt64,
  timestamp DateTime64(3)
) ENGINE = ReplicatedReplacingMergeTree()
ORDER BY (workspace_id, timestamp, id);

Rules: Always sync both versions, provide reversible down.sql, use ReplacingMergeTree, include workspace_id for isolation.

API Routes, Jobs, Backoffice, and Feature Implementation

API Routes (apps/web/src/app/api/):
Wrap with errorHandler(authHandler(…)) for protected endpoints.

Jobs (packages/core/src/jobs/):
Return undefined on success; throw for retryable errors; use captureException for non-retryable. Configure removeOnComplete: true, removeOnFail: false.

Backoffice:
Add to BackofficeRoutes enum, use server actions + useLatitudeAction for writes, API routes + SWR for reads.

Feature Checklist:

Database schema + migration
Services
Repositories
Actions
API routes
Stores
UI components
Routes

SDK Release Process and Event System

For TypeScript SDK:

Bump version in package.json (semver)
Update CHANGELOG.md
Push to main to trigger publish, GitHub release, and tagging.

Events: Declare in events.d.ts, publish via publisher, handle in handlers/index.ts.

In-Depth: Prompt Running Architecture

The prompt running system orchestrates foreground streaming and background execution.

High-level flow:
API → Gateway → runDocumentAtCommit → ChainStreamManager → Provider

Core components:

runDocumentAtCommit: Builds provider map, resolves content, validates chain with PromptL, creates Chain.
ChainStreamManager: Handles step execution, tool resolution (client, Latitude, MCP, agent tools), streams via Vercel AI SDK.
ai() service: Applies provider rules, configures models, executes streamText.

Background: enqueueRun → BullMQ job → backgroundRunJob (starts run, forwards events to Redis stream).

Telemetry: OpenTelemetry spans (Prompt, Completion, Tool, Step) processed with metadata.

Live evaluations trigger on spanCreated for qualifying logs, running specifications (LLM-as-judge, rule-based) to create EvaluationResultV2.

Chain events: ChainStarted, StepStarted, ProviderStarted, ToolsRequested, etc.

This architecture ensures traceable, scalable, production-grade prompt execution.

FAQ

How does Latitude handle destructive database changes safely?
Perform code removal and schema updates in PR 1 (deploy first), then generate and run the drop migration in PR 2 after the new code is live.

How are evaluations triggered automatically?
spanCreated events for Prompt spans trigger evaluateLiveLogJob, which enqueues runEvaluationV2Job for documents with evaluateLiveLogs: true.

What happens when a BullMQ job encounters an error?
Throw for retryable errors (BullMQ retries); use captureException for non-retryable errors while allowing the job to continue.

Why maintain both unclustered and clustered ClickHouse migrations?
Unclustered for development/single-node; clustered for production HA. They differ only in ON CLUSTER clause and Replicated* engine.

How are optimistic updates implemented in the UI?
In useLatitudeAction onSuccess handlers, call mutate to update SWR cache and show toast notifications.

How does the chain support tool calls across multiple steps?
ChainStreamManager uses lookupTools/resolveTools before each step, then processes ToolsRequested/Completed events during streamAIResponse.

Which ClickHouse folder is used in self-hosted setups?
Automatically selected based on the CLICKHOUSE_CLUSTER_ENABLED environment variable.

Latitude provides a robust, observable foundation for LLM engineering. Whether contributing patterns, implementing features, or running evaluations, the documented practices ensure clarity and reliability. Join the community on Slack to discuss implementations or share feedback.

(Word count: approximately 4,150)