AIRI banner

AIRI — A Practical Guide for Developers and Creators

AIRI is an open source project that aims to make “cyber life” — a digital companion that can chat, act, and even play games — available and practical for anyone to run, extend, and customize. This guide translates the original Chinese README into clear, approachable English and reorganizes the material so you can quickly understand what AIRI is, what it can do today, and how to start using and contributing to it. All content in this post is strictly drawn from the original project README.


Quick summary

AIRI is a browser-first, modular collection of tools and subprojects for building interactive digital agents — from simple chat companions to animated VTuber-style characters that can speak, listen, animate, and even interact with games such as Minecraft and Factorio. The project is designed to run in modern browsers (including mobile via PWA), supports multiple LLM backends and audio services, and provides separate packages for memory, embeddings, UI elements and integrations.


Who this guide is for

  • Developers who want a practical path to run or extend an interactive digital agent in the browser or desktop.
  • Artists and modelers who want to plug Live2D or VRM characters into a running agent.
  • Researchers and hobbyists who want to experiment with multi-modal agents that combine chat, voice, and animation.
  • project contributors looking for a clear, minimal setup and an understanding of the project layout.

Why AIRI exists

AIRI is inspired by public examples of game-playing, chat-capable virtual performers. The goal is not to reproduce any single closed-source system, but to create an open, modular toolkit that lets individuals run their own “digital life” systems locally or in the cloud — with a strong emphasis on browser compatibility and developer ergonomics. AIRI positions itself as a set of interoperable packages and apps so that people with varied skills can contribute UI components, models, or runtime integrations.


What makes AIRI different

  • Browser-first design. AIRI is built to run in modern browsers and supports Web APIs such as WebGPU, WebAudio, Web Workers, WebAssembly, and WebSocket. This makes it possible to run rich, interactive agents without forcing users to install heavyweight native applications. PWA support is available so the same app can be used on mobile devices.

  • Multi-modal capabilities. The project integrates chat, voice input, speech recognition, text-to-speech, and animated characters (VRM and Live2D), enabling agents to speak, listen, show facial motion and simple idle animations.

  • Game and agent integrations. AIRI includes agent implementations and tooling that enable the agent to play games such as Minecraft and Factorio, and to interact in chat platforms like Discord and Telegram.

  • Modular ecosystem. An array of subprojects and packages focus on memory layers, embedded databases, icon sets, and deployment helpers. This modular approach lets contributors focus on a single piece without changing the entire codebase.

These design choices emphasize running advanced features in the browser while giving extension points for local or cloud backends when required.


Current project status — features and roadmap

The README tracks progress by feature area. The list below reflects what the project has implemented and what’s still under construction. Each line is taken from the project status summary.

Cognition and agents

  • Agent gameplay and interaction: AIRI can control agents that play Minecraft and Factorio.
  • Chat agents: Running on Telegram and Discord is supported.
  • Memory: Browser-side database support exists using DuckDB WASM or embedded sqlite; a full-featured memory layer called Alaya is under active development.
  • Local browser inference (WebGPU-based) is noted as a future item.

Audio and speech

  • Browser microphone input is available.
  • Discord audio capture is supported.
  • Client-side speech recognition and speaking detection are implemented.
  • ElevenLabs is listed as an integrated TTS provider.

Animation and presentation

  • VRM support with model control and basic animations such as automatic blinking, gaze control, and idle eye movement.
  • Live2D support with the same basic automated behaviors.

This status snapshot is useful when choosing where to begin: if your goal is to play with agents in the browser, the current features already support voice, basic animation and remote chat integration. If you need advanced memory or fully local inference, those areas are still being expanded.


Supported LLM and service backends

AIRI is designed to be backend-agnostic and lists numerous LLM providers and local runtimes that the project supports or aims to support through its xsai integration. The README documents a long list of supported and work-in-progress backends. Below is a faithful transcription of that list from the source.

Currently listed as supported (checks indicate support in the README):

  • OpenRouter, vLLM, SGLang, Ollama, Google Gemini, OpenAI, Anthropic Claude, DeepSeek, Qwen, xAI, Groq, Mistral, Cloudflare Workers AI, Together.ai, Fireworks.ai, Novita, Zhifu (智谱), SiliconFlow (硅基流动), StepFun (阶跃星辰), Baichuan (百川), Minimax, Moonshot (月之暗面), Player2, Tencent Hunyuan (腾讯混元).

Listed as not fully integrated or work in progress:

  • Azure OpenAI, iFLYTEK Spark (讯飞星火), VolcEngine (火山引擎 豆包) and other platform-specific endpoints are noted as partial or WIP in the README.

This wide compatibility is intended to let developers choose a backend they prefer — whether a cloud API or a local runtime — and plug it into the AIRI stack. The README flags integration state per backend so you can plan expected work required for each one.


Subprojects and components

AIRI acts as a meta-project that spawns and depends on smaller packages. These subprojects provide pieces you’ll likely use directly or as reference when building custom features. Here is a curated list taken directly from the project index.

  • unspeech — a proxy server for /audio/transcriptions and /audio/speech, intended as a universal gateway for ASR and TTS backends.
  • hfup — tooling to package and deploy the project to Hugging Face Spaces.
  • @proj-airi/duckdb-wasm — an easy wrapper around @duckdb/duckdb-wasm to enable browser-side, embedded database use.
  • @proj-airi/drizzle-duckdb-wasm — a Drizzle ORM driver for DuckDB WASM, used to organize data access patterns in browser memory layers.
  • airi-factorio and related packages — a set of tools and libraries to allow AIRI agents to interact with Factorio and its servers, including autorio automation libraries.
  • UI and icon packages such as @proj-airi/lobe-icons provide visual resources to standardize appearance across AIRI apps.

These components reflect the project’s practical focus: rich UI, in-browser data handling, audio integration, and game interoperability. Because each component is its own package, you can adopt just the pieces relevant to your project.


Development quickstart — run it locally

If you want to get a working copy running locally, the README gives a minimal set of commands. Use these commands from the project root. They assume you have pnpm installed and a cloned repository. Run the exact commands below:

pnpm i
pnpm dev

To run individual pieces:

  • Web version (the same build that powers airi.moeru.ai):

    pnpm dev:web
    
  • Desktop “tamagotchi” version (a lightweight desktop pet mode):

    pnpm dev:tamagotchi
    
  • Local documentation website:

    pnpm -F @proj-airi/docs dev
    

These instructions give you a straightforward path to a development server or the specific sub-app you want to test. The README refers contributors to CONTRIBUTING.md for more in-depth development guidance.


Architecture overview

The project README includes a component diagram (Mermaid) that shows how core pieces relate. The diagram contains the main runtime, memory drivers, UI components, stage and app layers, and agent integrations for games. A faithful translation of the diagram labels is included below so you can paste it into a Mermaid renderer if you wish to visualize architecture in your own notes.

flowchart TD
  Core("Core")
  Unspeech("unspeech")
  DBDriver("@proj-airi/drizzle-duckdb-wasm")
  MemoryDriver("[WIP] Memory Alaya")
  DB1("@proj-airi/duckdb-wasm")
  SVRT("@proj-airi/server-runtime")
  Memory("Memory")
  STT("STT")
  Stage("Stage")
  StageUI("@proj-airi/stage-ui")
  UI("@proj-airi/ui")

  subgraph AIRI
    DB1 --> DBDriver --> MemoryDriver --> Memory --> Core
    UI --> StageUI --> Stage --> Core
    Core --> STT
    Core --> SVRT
  end

  subgraph UI_Components
    UI --> StageUI
    UITransitions("@proj-airi/ui-transitions") --> StageUI
    UILoadingScreens("@proj-airi/ui-loading-screens") --> StageUI
    FontCJK("@proj-airi/font-cjkfonts-allseto") --> StageUI
    FontXiaolai("@proj-airi/font-xiaolai") --> StageUI
  end

  subgraph Apps
    Stage --> StageWeb("@proj-airi/stage-web")
    Stage --> StageTamagotchi("@proj-airi/stage-tamagotchi")
    Core --> RealtimeAudio("@proj-airi/realtime-audio")
    Core --> PromptEngineering("@proj-airi/playground-prompt-engineering")
  end

  subgraph Server_Components
    Core --> ServerSDK("@proj-airi/server-sdk")
    ServerShared("@proj-airi/server-shared") --> SVRT
    ServerShared --> ServerSDK
  end

  STT -->|Speaking| Unspeech
  SVRT -->|Playing Factorio| F_AGENT
  SVRT -->|Playing Minecraft| MC_AGENT

  subgraph Factorio_Agent
    F_AGENT("Factorio Agent")
    F_API("Factorio RCON API")
    factorio-server("factorio-server")
    F_MOD1("autorio")
    F_AGENT --> F_API -.-> factorio-server
    F_MOD1 -.-> factorio-server
  end

  subgraph Minecraft_Agent
    MC_AGENT("Minecraft Agent")
    Mineflayer("Mineflayer")
    minecraft-server("minecraft-server")
    MC_AGENT --> Mineflayer -.-> minecraft-server
  end

  XSAI("xsAI") --> Core
  XSAI --> F_AGENT
  XSAI --> MC_AGENT

  Core --> TauriMCP("@proj-airi/tauri-plugin-mcp")
  Memory_PGVector("@proj-airi/memory-pgvector") --> Memory

This diagram highlights a few core ideas:

  • A browser-embedded DB (DuckDB WASM) feeds into a memory layer.
  • The UI and stage layers are decoupled from core logic so different frontends can be developed independently.
  • Game agents and server runtime components are connected through a common core and xsai integration.

How to contribute — a minimal, practical path

If you want to contribute but don’t know where to begin, the README suggests these minimal steps. They are designed to lower the friction for first-time contributors. These suggestions are taken from the contribution guidance in the source.

  1. Browse the repository and pick one subproject you find interesting — for example @proj-airi/duckdb-wasm or airi-factorio.
  2. Read CONTRIBUTING.md for the project’s development norms and expectations.
  3. Open a discussion or issue to introduce yourself and explain what you plan to work on.
  4. If you’re an artist or modeler, consider contributing Live2D or VRM assets; if you’re a frontend developer, add UI components; if you prefer backend work, focus on integrations or memory drivers.

The project explicitly welcomes a wide range of skill sets — you do not need to be an expert in Vue, TypeScript, or WebGPU to make meaningful contributions. Creating a dedicated subdirectory for your integration (React, Svelte, or other) is encouraged.


Subproject highlights and related tools

AIRI has generated several useful side projects and companion tools that are worth knowing about if you are building or deploying agents from this stack. These are listed in the README and include: unspeech, hfup, demodel, inventory, MCP Launcher, and others. Many of these projects solve practical problems — from audio proxying to packaging for hosting platforms. The README lists these resources as part of the broader AIRI ecosystem.


Similar projects and ecosystem context

The README provides a short list of open source and commercial projects related to AIRI. These help to situate AIRI within a larger ecosystem of agent and VTuber projects. The list includes open-source recreations, agent frameworks, and community projects. This is a helpful reference if you want to compare approaches or find complementary tooling.


Practical tips for getting the most out of AIRI

These concise, practical recommendations are distilled from the README’s guidance and project structure. They aim to reduce wasted effort when you start experimenting.

  • Start with the docs site. If you want to understand the architecture and APIs first, run the documentation local server (pnpm -F @proj-airi/docs dev) before launching the full app.

  • Pick a familiar LLM backend. The project supports many backends. Choose one you already know to reduce integration friction, then expand to others once the core flow works. The README marks which backends are fully supported and which are WIP.

  • Work on UI or assets as a safe first step. The UI and animation layers are nicely decoupled from the core logic. Contributing model assets, UI components, or icons is an effective way to get involved without changing core runtime code.

  • Use the modular packages. Many features are implemented as independent packages; adopt just what you need (e.g., duckdb-wasm for local memory, unspeech for audio proxying).


Frequently asked questions (FAQ)

Is AIRI open source?
Yes. The project and many of its packages are hosted on GitHub, and a dedicated organization (@proj-airi) manages subprojects. The README includes links to the main repository and subpackages.

How can I try a live demo?
The README references a web demo hosted at airi.moeru.ai and provides a Discord invite for community interaction. Use the demo and community channels to see the project in action.

What do I need to run the project locally?
The README recommends pnpm as the package manager. The essential commands are pnpm i and pnpm dev, with additional subcommands for web, tamagotchi (desktop pet), and docs.

Can AIRI run on mobile devices?
AIRI has a browser-first design and includes PWA support, which allows usage on modern mobile browsers. The README highlights browser compatibility as a primary design goal.

Does AIRI support local inference?
The README lists local browser inference on WebGPU as a target but marks pure browser-based local inference as not yet complete. The project does, however, support many local and remote runtime options through xsai integrations.


Acknowledgements and lineage

The README credits a number of projects and design inspirations that helped shape AIRI’s approach to UI, rendering and tooling. It also notes that the project was inspired by publicly visible game-playing VTuber efforts and points to multiple community projects and libraries that guided the design choices. The acknowledgements section lists contributors and related projects referenced by the maintainers.


Closing notes

AIRI is presented in the README as an evolving, modular toolkit for building browser-first, multimodal digital agents. The project’s core strengths are its browser compatibility, modular subprojects, and a clear path for contributors across design, front-end, backend and AI-focused disciplines. If your interest is in practical experimentation — running a demo locally, integrating a favorite LLM backend, contributing a Live2D or VRM model, or extending the system to play a specific game — the README provides the commands, component references, and an active ecosystem of subprojects to get started.


Structured data snippet (FAQ + HowTo)

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "FAQPage",
      "mainEntity": [
        {
          "@type": "Question",
          "name": "What is AIRI?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "AIRI is an open source, browser-first collection of tools for creating interactive digital agents that can chat, speak, listen, animate and interact with games."
          }
        },
        {
          "@type": "Question",
          "name": "How do I run AIRI locally?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "Install dependencies with pnpm i, then run pnpm dev. Use pnpm dev:web for the web app or pnpm dev:tamagotchi for the desktop pet mode."
          }
        }
      ]
    },
    {
      "@type": "HowTo",
      "name": "Quick local start",
      "step": [
        {"@type":"HowToStep","text":"Clone the repository"},
        {"@type":"HowToStep","text":"pnpm i"},
        {"@type":"HowToStep","text":"pnpm dev (or pnpm dev:web / pnpm dev:tamagotchi)"}
      ]
    }
  ]
}