Site icon Efficient Coder

IntraScribe: Unlock Secure, Local-First Transcription for Sensitive Meetings

IntraScribe: A Local-First Voice Transcription & Collaboration Platform

For companies, schools, and government offices that can’t — or won’t — send data to the cloud.


1. What Is IntraScribe?

Imagine finishing a two-hour meeting and having a clean, editable transcript—complete with speaker names and a concise AI summary—before you’ve even left the room.
IntraScribe makes that possible without ever sending audio outside your building.

In plain language:

  • Real-time speech-to-text that runs on your own server
  • Automatic speaker diarization (“Who said what?”)
  • AI-generated summaries in Markdown
  • Full data sovereignty — no cloud, no external APIs

2. Why Local-First Matters

Scenario Risk with Cloud Services IntraScribe Approach
Sensitive R&D meeting IP could leak Everything stays on-prem
Student counseling session FERPA/GDPR violations Data never leaves campus
Hospital case review HIPAA non-compliance Air-gapped install
Command-and-control center 2-second cloud latency Sub-500 ms local latency

3. Core Features in Everyday Terms

Feature What You See What Happens Under the Hood
Real-time transcription Words appear as you speak Browser → WebRTC → FunASR model → SSE stream back
Speaker labels “Alice: …”, “Bob: …” Pyannote model slices audio by voice
Batch re-transcription Higher accuracy after the call Original + cached audio re-processed on GPU
Editable transcript Double-click to fix typos Postgres update → Real-time refresh for all viewers
AI summary & title One-click Markdown report LiteLLM picks best model from config.yaml
Template library Company-branded formats Save per-user or system-wide templates
Session management Start, pause, re-transcribe, delete REST endpoints + Supabase Realtime

4. Who Should Use It?

  1. Enterprise IT
    Locked-down VLANs, strict infosec reviews, zero-trust architecture.

  2. Universities & Research Labs
    Lecture capture, thesis defenses, multi-language seminars.

  3. Government & Defense
    Classified briefings, inter-agency coordination.

  4. Healthcare & Legal
    Patient consults, depositions, contract negotiations.


5. End-to-End Workflow (3-Minute Overview)

  1. Create Session
    Click “Start Recording” → Browser asks for mic permission → Backend returns session_id.

  2. Live Transcription
    Audio flows via WebRTC; text chunks arrive via Server-Sent Events (SSE) in <500 ms.

  3. Stop & Finalize
    Click “Stop” → Browser closes WebRTC → Server uploads full audio to Supabase Storage → GPU batch job starts.

  4. Auto Enhance
    Batch job: noise reduction → speaker diarization → high-accuracy re-transcription → Postgres update.

  5. AI Summary
    Push “Summarize”; LiteLLM uses your template to spit out a Markdown file and a one-line title.

  6. Edit & Share
    Double-click any segment or speaker label → changes sync to all teammates in real time.


6. Tech Stack (High-Level)

Layer Technology Purpose
Frontend Next.js (App Router) + React + TypeScript + Tailwind CSS Fast, modern UI
Backend FastAPI (Python, uv-managed) REST, SSE, WebRTC endpoints
ASR FunASR (local) Chinese + English speech recognition
Diarization pyannote.audio Voice fingerprinting
AI Generation LiteLLM (ollama, OpenAI, Azure fallbacks) Summaries & titles
Storage Supabase (Postgres + Auth + Storage + Realtime) ACID data, file blobs, row-level security
Media Processing FFmpeg Transcode, slice, metadata extraction

7. Folder Layout at a Glance

intrascribe/
├─ backend/
│  ├─ app/
│  │  ├─ api.py                # REST & SSE routes
│  │  ├─ services.py           # Business logic
│  │  ├─ stt_adapter.py        # FunASR wrapper
│  │  ├─ speaker_diarization.py
│  │  ├─ batch_transcription.py
│  │  ├─ audio_processing_service.py
│  │  ├─ audio_converter.py    # FFmpeg wrapper
│  │  ├─ schemas.py, models.py # DTOs & domain models
│  │  ├─ clients.py, repositories.py
│  ├─ main_v1.py               # WebRTC entry point
│  ├─ config.yaml              # AI & ASR settings
│  └─ pyproject.toml
├─ web/                        # Next.js frontend
├─ supabase/
│  ├─ database_schema.sql
│  └─ migrations/

8. Installation Guide (Copy-Paste Ready)

8.1 Prerequisites

  • Node.js 18+
  • Python 3.10+ with uv (Python runner)
  • FFmpeg
  • (Optional) ollama qwen3:8b for local AI summaries
# Ubuntu / Debian
sudo apt update && sudo apt install ffmpeg

# macOS (Homebrew)
brew install ffmpeg

# Install Supabase CLI
curl -fsSL https://raw.githubusercontent.com/supabase/cli/main/install.sh | bash

8.2 Clone & Start Database

git clone https://github.com/your-org/intrascribe.git
cd intrascribe/supabase

supabase start
# If you hit 502, skip edge-runtime:
# supabase start -x edge-runtime

Copy the printed URLs/keys; you’ll need them next.

supabase db reset   # Seed tables, RLS, functions

8.3 Environment Files

web/.env.local

NEXT_PUBLIC_SUPABASE_URL=http://127.0.0.1:54321
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJhbGc...
BACKEND_URL=http://localhost:8000

backend/.env

SUPABASE_URL=http://127.0.0.1:54321
SUPABASE_ANON_KEY=eyJhbGc...
SUPABASE_SERVICE_ROLE_KEY=eyJhbGc...
HUGGINGFACE_TOKEN=hf_...
PYANNOTE_MODEL=pyannote/speaker-diarization-3.1

8.4 Launch the Stack

# Terminal 1 – Backend
cd backend
uv sync
uv run main_v1.py
# Listening on http://localhost:8000

# Terminal 2 – Frontend
cd web
npm install
npm run dev
# Open http://localhost:3000

Register an account and you’re in.


9. API Quick Reference

All endpoints live under /api/v1.

Purpose Method & Path Notes
Health check GET /health Returns 200 OK
Create session POST /sessions Returns {session_id}
Finalize session POST /sessions/{id}/finalize Triggers batch job
Re-transcribe POST /sessions/{id}/retranscribe Uses latest model
Delete session DELETE /sessions/{id} Also deletes audio files
Generate summary POST /sessions/{id}/summarize Optional template_id
Rename speaker POST /sessions/{id}/rename-speaker Updates all segments
Live SSE stream GET /transcript?webrtc_id=... Requires Bearer token

Authentication: Attach the Supabase JWT as Authorization: Bearer <token>.


10. Common Troubleshooting

Symptom Likely Cause Quick Fix
No real-time text Mic permission denied Browser settings → allow mic
Diarization fails Missing HF token Add HUGGINGFACE_TOKEN to .env
Transcode error FFmpeg missing Re-install FFmpeg, ensure in PATH
Summary empty Model mis-configured Check config.yaml for correct LiteLLM keys
Frontend 404 on API Proxy mis-configured Verify BACKEND_URL in .env.local

11. Customization & Extensibility

  • Swap Speech Engine
    Implement BaseASR in stt_adapter.py, update config.yaml.

  • Custom Summarization Prompts
    Add new Markdown templates in the UI; use {{transcript}}, {{speakers}}, {{duration}} placeholders.

  • Hardware Front-End
    Replace WebRTC with WebSocket, gRPC, or raw UDP; the backend remains unchanged.

  • LDAP / SSO Integration
    Supabase Auth supports SAML 2.0 and generic OAuth providers.


12. Roadmap: What’s Next?

Status Feature
✅ Available Real-time transcription, speaker labels, AI summary
🚧 In Dev Edge-device capture (Raspberry Pi, microphone arrays)
🔮 Planned Conversational AI: “What did Alice conclude in the last meeting?”

13. FAQ: One-Line Answers

  • Does it need internet?
    Only for the first model download; then fully offline.

  • GPU required?
    No—CPU works, just slower.

  • Multiple concurrent sessions?
    Yes; one browser tab = one capture stream.

  • Mobile support?
    Works in any modern mobile browser; add to home screen as PWA.

  • Commercial license?
    MIT—use, modify, resell freely.


14. Final Word

IntraScribe isn’t magic.
It’s a carefully glued stack of open-source pieces—FunASR for ears, pyannote for memory, LiteLLM for a brain—running on your hardware, your network, your terms.

If you’ve read this far, the quickest way to feel the difference is to spin it up locally.
Five minutes from now, you could be watching your own words appear on screen—and staying there.

Exit mobile version