Building an AI Workflow Orchestrator in 4,500 Lines: The PaiAgent Story

Can a two-week, one-person sprint yield a production-ready visual pipeline that chains LLMs and text-to-speech, survives real browsers, and still fits in one Git repo?
Yes—if you treat the DAG engine like Lego bricks, not rocket science.


1. Why We Rolled Our Own DAG Engine Instead of Grabbing Activiti

Question answered: “Why bother writing another topological sort when battle-tested engines exist?”

  • Scope creep kills deadlines. Activiti, Camunda, Temporal bring history tables, event buses, cluster locks—overkill for “drag nodes, run in order, show logs”.
  • Educational leverage. Implementing Kahn’s algorithm once forces you to really understand node scheduling—handy when users draw cyclic graphs at 2 a.m.
  • Binary diet. Our backend.jar is 18 MB; the smallest Camunda Spring Boot starter is 85 MB. Fewer bytes, fewer CVEs.

Scenario – AI podcast in three hops
Input → OpenAI node → TTS node → Output
The engine computes the topological order in 1 ms, executes sequentially, and streams each node’s stdout to the browser. Memory stays under 30 MB on a 1-core VM.

Author reflection
I lost half a day wiring Camunda’s job executor before admitting defeat: if the user can’t pronounce “BPMN”, don’t ship it. Kahn + DFS in 200 lines restored sanity.


2. Three Tables, Zero JOINs: The Schema That Fits in Your Head

Question answered: “How minimal can storage be and still feel professional?”

Table Job
workflow Single JSON blob holding nodes & edges. Versioning = entire row.
node_definition Catalogue of draggable types (icon, config schema).
execution_record Immutable run snapshot: input, outputs, per-node logs, duration.

Operational example
User hits “Save” – frontend POSTs:

{"name":"Podcast-Flow","nodes":[{...}],"edges":[{...}]}

Backend stores the string verbatim; SELECT → straight back to ReactFlow. No ORM gymnastics, no attribute splitting.

Author reflection
I feared JSON-in-RDBMS would feel dirty. It actually removed an entire DAO layer, and MySQL 8’s JSON functions rescue us if we ever need SQL analytics.


3. NodeExecutor = Interface + Factory—Adding Claude Tomorrow Takes 40 Lines

Question answered: “How do you plug new models without touching old code?”

  1. Contract:
public interface NodeExecutor {
    Map<String,Object> execute(WorkflowNode n, Map<String,Object> input);
    String getNodeType();
}
  1. Factory discovers every @Component implementing the interface and caches them.

  2. Ship a new JAR with:

@Component
public class ClaudeNodeExecutor implements NodeExecutor {
    public String getNodeType(){ return "claude"; }
    public Map<String,Object> execute(...){ ... }
}

No if/else chains, no recompile of core.

Scenario – side-by-side LLM shoot-out
User drags OpenAI and Claude nodes, wires the same prompt to both, merges outputs with a custom “diff” node. The factory instantiates two different executors, the engine runs them sequentially, the UI shows token cost and latency for each. A/B testing without code.

Author reflection
Adapter pattern feels academic until you see the PR diff: +45 lines, −0 old lines. That’s when design patterns pay rent.


4. ReactFlow in 140 Lines: Drag, Drop, Connect, Done

Question answered: “What’s the shortest path to a Figma-like node editor on the web?”**

Key snippets (TypeScript):

// NodePanel – start drag
const onDragStart = (e: DragEvent, type: string) => {
  e.dataTransfer.setData('nodeType', type);
};

// FlowCanvas – accept drop
const onDrop = (e: DragEvent) => {
  const type = e.dataTransfer.getData('nodeType');
  const newNode = { id: `${type}-${Date.now()}`, position, data: { type } };
  setNodes(nds => nds.concat(newNode));
};

// ConfigPanel – dynamic Ant Design form
<Form schema={selectedNode.configSchema} onValuesChange={updateNodeData} />

Zustand keeps a single nodes array; ReactFlow’s onNodesChange pipes CRUD back to that store—bidirectional, no prop-drilling.

Scenario – marketing team building a newsletter voice-over
PM drags Input → DeepSeek → TTS → Output, changes DeepSeek temperature from 0.7 to 0.9 in the right panel, sees the slider value propagate to canvas instantly, clicks Save. Zero page refresh, zero API call until Save.

Author reflection
I tried redux-first, got lost in boilerplate. Zustand’s useWorkflowStore hook cut state-related lines by 60 %; sometimes the best library is the one you don’t fight.


5. Debug Drawer: Making Execution Visible, Not a Black Box

Question answered: “How do you show what happened, where, and how long it took—live?”

Backend streams List<ExecutionNodeResult>; frontend renders Ant Design Timeline:

  • Green = success, Red = error, Grey = pending.
  • Each card folds open to reveal input/output JSON.
  • TTS node additionally embeds <AudioPlayer src={output.audioUrl} />.

Scenario – diagnosing a failed prompt
Second node turns red; user expands card: {"error":"Prompt too long: 8,547 tokens > 4,096 limit"}. Fix prompt, re-run, watch the timeline turn green step-by-step. No AWS CloudTrail needed.

Author reflection
We once returned only a success boolean; support tickets spiked. After adding the timeline, “why my flow fail?” questions dropped 80 %. Visibility > verbosity.


6. TTS Node: From Silent Text to Playable Audio in 5 Seconds

Question answered: “How to ship hearable output without waiting for Azure quotas?”

  • Simulation mode: copy a pre-generated neutral-voice MP3, rename with UUID, expose via Spring static handler.
  • Real mode: plug Azure key → call SDK → same UUID path; frontend code unchanged.
  • Player component: plain HTML5 <audio> + progress bar + download anchor, wrapped in React for consistency.

Scenario – content creator batch-producing podcast intros
She runs 10 flows, each returns its own mp3 URL. Player lets her preview all, download zipped batch, publish to RSS. No external DAW opened.

Author reflection
Simulation felt like cheating until Azure throttled our free tier on demo day. The fallback saved the presentation—always have a dumb version that works.


7. 11 REST Endpoints—Enough to Feel RESTful, Not Overwhelming

Resource End-points (total 11) Notes
/api/auth 3 (login/logout/me) Token in memory, no JWT for MVP.
/api/workflows 5 (CRUD + execute) execute returns executionId immediately.
/api/node-types 1 (list) Front auto-populates palette.
/api/executions 2 (poll result) Optional polling for live logs.

All payloads wrapped in Result<T> with code, msg, data. Axios interceptor unwraps and throws on non-zero code—centralized error toast.

Scenario – CI pipeline calling execute
curl -X POST localhost:8080/api/workflows/42/execute -d '{"input":"AI in 2025"}' returns {"data":{"executionId":107}}. Poll /executions/107 until status=COMPLETED, then grab output.audioUrl.

Author reflection
Starting with 11 endpoints felt stingy; after three demos no one asked for more. YAGNI vindicated again.


8. Production Gotchas: CORS, Timeouts, and the Missing Audio Folder

Checklist from real deploys:

  • CORS – Spring Boot must addCorsMappings() for localhost:5173 and any CDN origin.
  • Axios timeout – default 10 s kills LLM calls; raise to 60 s and surface in UI.
  • Static audio – container needs volume mount ./audio_output:/app/audio_output; else Docker restart wipes files.
  • Node labels – ReactFlow uses data.label; if you forget, nodes render “Unknown” and users panic.

Author reflection
The first live demo failed because audio_output/ was in .gitignore and not in Docker context. Now the CI step mkdir -p audio_output is non-negotiable.


9. Action Checklist / Implementation Steps

  1. Install JDK 21, Node 18+, MySQL 8.
  2. Create DB paiagent and source backend/src/main/resources/schema.sql.
  3. Clone repo; open application.yml; set DB password.
  4. ./mvnw spring-boot:run (port 8080).
  5. cd frontend && npm i && npm run dev (port 5173).
  6. Browse http://localhost:5173, login admin / 123.
  7. Drag Input → LLM → TTS → Output; wire them.
  8. Configure LLM prompt and TTS voice.
  9. Save; open Debug Drawer; input text; Execute.
  10. Preview audio, download mp3, tweet your 30-second AI podcast.

10. One-page Overview

  • Goal – Ship a visual LLM+TTS pipeline in two weeks.
  • Core engine – Kahn topological sort + DFS cycle check, 200 LOC.
  • Front-end – ReactFlow canvas, Zustand state, Ant Design forms.
  • Extensibility – NodeExecutor interface + Spring auto-scan; new model = one class.
  • Debug UX – Live timeline with per-node I/O and inline audio player.
  • Deliverables – 4,500 lines, 11 REST endpoints, 3 DB tables, 1 jar, 1 vite build.
  • Outcome – Non-coders drag-and-drop AI workflows, hit Execute, listen to results in under a minute.

11. FAQ

Q1: Can I add parallel node execution?
A: Engine is sequential; annotate WorkflowEngine.execute() with @Async and supply a ThreadPoolTaskExecutor—no other code changes.

Q2: Is the simulation audio royalty-free?
A: Yes, we bundle a 5-second neutral clip donated by a team member; replace if you need commercial distribution.

Q3: How do I switch from simulation to real Azure TTS?
A: Put your Azure key in application.yml, set TTS provider to azure, keep everything else identical.

Q4: Does it scale horizontally?
A: State is in DB; multiple Spring Boot instances can run behind a load balancer. Use sticky sessions if you poll execution logs.

Q5: Why token-based auth instead of JWT?
A: MVP scope; tokens live in a ConcurrentHashMap. Swap in JWT or OAuth when you need refresh tokens.

Q6: Can the front-end live on a different domain?
A: Absolutely—just update CORS allowed-origins in WebConfig.java.

Q7: What if the workflow is cyclic?
A: The DAG parser throws a 400 Bad Request before any node runs; the UI highlights the offending edge in red.

Q8: How big can a workflow be?
A: Tested with 200 nodes / 300 edges; topology sort still < 5 ms on a 2 GHz CPU.