IntraScribe: A Local-First Voice Transcription & Collaboration Platform
For companies, schools, and government offices that can’t — or won’t — send data to the cloud.
1. What Is IntraScribe?
Imagine finishing a two-hour meeting and having a clean, editable transcript—complete with speaker names and a concise AI summary—before you’ve even left the room.
IntraScribe makes that possible without ever sending audio outside your building.
In plain language:
-
Real-time speech-to-text that runs on your own server -
Automatic speaker diarization (“Who said what?”) -
AI-generated summaries in Markdown -
Full data sovereignty — no cloud, no external APIs
2. Why Local-First Matters
Scenario | Risk with Cloud Services | IntraScribe Approach |
---|---|---|
Sensitive R&D meeting | IP could leak | Everything stays on-prem |
Student counseling session | FERPA/GDPR violations | Data never leaves campus |
Hospital case review | HIPAA non-compliance | Air-gapped install |
Command-and-control center | 2-second cloud latency | Sub-500 ms local latency |
3. Core Features in Everyday Terms
Feature | What You See | What Happens Under the Hood |
---|---|---|
Real-time transcription | Words appear as you speak | Browser → WebRTC → FunASR model → SSE stream back |
Speaker labels | “Alice: …”, “Bob: …” | Pyannote model slices audio by voice |
Batch re-transcription | Higher accuracy after the call | Original + cached audio re-processed on GPU |
Editable transcript | Double-click to fix typos | Postgres update → Real-time refresh for all viewers |
AI summary & title | One-click Markdown report | LiteLLM picks best model from config.yaml |
Template library | Company-branded formats | Save per-user or system-wide templates |
Session management | Start, pause, re-transcribe, delete | REST endpoints + Supabase Realtime |
4. Who Should Use It?
-
Enterprise IT
Locked-down VLANs, strict infosec reviews, zero-trust architecture. -
Universities & Research Labs
Lecture capture, thesis defenses, multi-language seminars. -
Government & Defense
Classified briefings, inter-agency coordination. -
Healthcare & Legal
Patient consults, depositions, contract negotiations.
5. End-to-End Workflow (3-Minute Overview)
-
Create Session
Click “Start Recording” → Browser asks for mic permission → Backend returnssession_id
. -
Live Transcription
Audio flows via WebRTC; text chunks arrive via Server-Sent Events (SSE) in <500 ms. -
Stop & Finalize
Click “Stop” → Browser closes WebRTC → Server uploads full audio to Supabase Storage → GPU batch job starts. -
Auto Enhance
Batch job: noise reduction → speaker diarization → high-accuracy re-transcription → Postgres update. -
AI Summary
Push “Summarize”; LiteLLM uses your template to spit out a Markdown file and a one-line title. -
Edit & Share
Double-click any segment or speaker label → changes sync to all teammates in real time.
6. Tech Stack (High-Level)
Layer | Technology | Purpose |
---|---|---|
Frontend | Next.js (App Router) + React + TypeScript + Tailwind CSS | Fast, modern UI |
Backend | FastAPI (Python, uv-managed) | REST, SSE, WebRTC endpoints |
ASR | FunASR (local) | Chinese + English speech recognition |
Diarization | pyannote.audio | Voice fingerprinting |
AI Generation | LiteLLM (ollama, OpenAI, Azure fallbacks) | Summaries & titles |
Storage | Supabase (Postgres + Auth + Storage + Realtime) | ACID data, file blobs, row-level security |
Media Processing | FFmpeg | Transcode, slice, metadata extraction |
7. Folder Layout at a Glance
intrascribe/
├─ backend/
│ ├─ app/
│ │ ├─ api.py # REST & SSE routes
│ │ ├─ services.py # Business logic
│ │ ├─ stt_adapter.py # FunASR wrapper
│ │ ├─ speaker_diarization.py
│ │ ├─ batch_transcription.py
│ │ ├─ audio_processing_service.py
│ │ ├─ audio_converter.py # FFmpeg wrapper
│ │ ├─ schemas.py, models.py # DTOs & domain models
│ │ ├─ clients.py, repositories.py
│ ├─ main_v1.py # WebRTC entry point
│ ├─ config.yaml # AI & ASR settings
│ └─ pyproject.toml
├─ web/ # Next.js frontend
├─ supabase/
│ ├─ database_schema.sql
│ └─ migrations/
8. Installation Guide (Copy-Paste Ready)
8.1 Prerequisites
-
Node.js 18+ -
Python 3.10+ with uv
(Python runner) -
FFmpeg -
(Optional) ollama qwen3:8b for local AI summaries
# Ubuntu / Debian
sudo apt update && sudo apt install ffmpeg
# macOS (Homebrew)
brew install ffmpeg
# Install Supabase CLI
curl -fsSL https://raw.githubusercontent.com/supabase/cli/main/install.sh | bash
8.2 Clone & Start Database
git clone https://github.com/your-org/intrascribe.git
cd intrascribe/supabase
supabase start
# If you hit 502, skip edge-runtime:
# supabase start -x edge-runtime
Copy the printed URLs/keys; you’ll need them next.
supabase db reset # Seed tables, RLS, functions
8.3 Environment Files
web/.env.local
NEXT_PUBLIC_SUPABASE_URL=http://127.0.0.1:54321
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJhbGc...
BACKEND_URL=http://localhost:8000
backend/.env
SUPABASE_URL=http://127.0.0.1:54321
SUPABASE_ANON_KEY=eyJhbGc...
SUPABASE_SERVICE_ROLE_KEY=eyJhbGc...
HUGGINGFACE_TOKEN=hf_...
PYANNOTE_MODEL=pyannote/speaker-diarization-3.1
8.4 Launch the Stack
# Terminal 1 – Backend
cd backend
uv sync
uv run main_v1.py
# Listening on http://localhost:8000
# Terminal 2 – Frontend
cd web
npm install
npm run dev
# Open http://localhost:3000
Register an account and you’re in.
9. API Quick Reference
All endpoints live under /api/v1
.
Purpose | Method & Path | Notes |
---|---|---|
Health check | GET /health |
Returns 200 OK |
Create session | POST /sessions |
Returns {session_id} |
Finalize session | POST /sessions/{id}/finalize |
Triggers batch job |
Re-transcribe | POST /sessions/{id}/retranscribe |
Uses latest model |
Delete session | DELETE /sessions/{id} |
Also deletes audio files |
Generate summary | POST /sessions/{id}/summarize |
Optional template_id |
Rename speaker | POST /sessions/{id}/rename-speaker |
Updates all segments |
Live SSE stream | GET /transcript?webrtc_id=... |
Requires Bearer token |
Authentication: Attach the Supabase JWT as Authorization: Bearer <token>
.
10. Common Troubleshooting
Symptom | Likely Cause | Quick Fix |
---|---|---|
No real-time text | Mic permission denied | Browser settings → allow mic |
Diarization fails | Missing HF token | Add HUGGINGFACE_TOKEN to .env |
Transcode error | FFmpeg missing | Re-install FFmpeg, ensure in PATH |
Summary empty | Model mis-configured | Check config.yaml for correct LiteLLM keys |
Frontend 404 on API | Proxy mis-configured | Verify BACKEND_URL in .env.local |
11. Customization & Extensibility
-
Swap Speech Engine
ImplementBaseASR
instt_adapter.py
, updateconfig.yaml
. -
Custom Summarization Prompts
Add new Markdown templates in the UI; use{{transcript}}
,{{speakers}}
,{{duration}}
placeholders. -
Hardware Front-End
Replace WebRTC with WebSocket, gRPC, or raw UDP; the backend remains unchanged. -
LDAP / SSO Integration
Supabase Auth supports SAML 2.0 and generic OAuth providers.
12. Roadmap: What’s Next?
Status | Feature |
---|---|
✅ Available | Real-time transcription, speaker labels, AI summary |
🚧 In Dev | Edge-device capture (Raspberry Pi, microphone arrays) |
🔮 Planned | Conversational AI: “What did Alice conclude in the last meeting?” |
13. FAQ: One-Line Answers
-
Does it need internet?
Only for the first model download; then fully offline. -
GPU required?
No—CPU works, just slower. -
Multiple concurrent sessions?
Yes; one browser tab = one capture stream. -
Mobile support?
Works in any modern mobile browser; add to home screen as PWA. -
Commercial license?
MIT—use, modify, resell freely.
14. Final Word
IntraScribe isn’t magic.
It’s a carefully glued stack of open-source pieces—FunASR for ears, pyannote for memory, LiteLLM for a brain—running on your hardware, your network, your terms.
If you’ve read this far, the quickest way to feel the difference is to spin it up locally.
Five minutes from now, you could be watching your own words appear on screen—and staying there.