yap: The Ultimate Guide to On-Device Speech Transcription for macOS
Privacy-First Audio Transcription Without Cloud Services or API Keys

Why Local Speech Transcription Matters in Today’s Digital Landscape
Privacy concerns have become paramount in our increasingly connected world. When you use cloud-based transcription services, your sensitive audio files travel across the internet to third-party servers. This creates significant privacy risks for confidential business meetings, personal conversations, medical consultations, and legal discussions.
yap addresses these concerns by performing all transcription work locally on your macOS device. This open-source command-line tool leverages Apple’s built-in Speech framework to deliver accurate transcriptions while ensuring your data never leaves your computer. For professionals handling sensitive content, this local-first approach isn’t just convenient—it’s essential for maintaining confidentiality and compliance.
Understanding yap’s Core Functionality
yap is a lightweight command-line utility designed specifically for macOS 26 and later versions. It converts spoken audio in various file formats (MP3, MP4, WAV, etc.) into text transcripts or SRT subtitle files. What sets it apart is its complete reliance on Apple’s on-device speech recognition capabilities:
-
Zero internet connection required: Works entirely offline -
No external dependencies: Uses native macOS frameworks -
Privacy by design: Audio files never leave your local machine -
Flexible output options: Plain text or timed SRT subtitles
The Technology Behind yap: Apple’s Speech Framework
At yap’s core is Apple’s Speech.framework, a powerful built-in technology that has seen significant improvements in macOS 26. This framework provides:
-
Advanced neural network-based speech recognition: Optimized for Apple Silicon processors -
On-device processing: Eliminates need for cloud services -
Multi-language support: Through simple locale parameters -
Hardware acceleration: Leverages Apple’s Neural Engine for efficient processing
Unlike cloud-based alternatives, Speech.framework maintains user privacy while delivering increasingly accurate transcription results with each macOS update.
Installation Made Simple
Homebrew Installation (Recommended)
brew install finnvoor/tools/yap
Homebrew remains the most popular package manager for macOS. This single-line command installs yap and all necessary dependencies in seconds. After installation, verify it works by running yap --version
in your terminal.
Mint Installation Alternative
mint install finnvoor/yap
For developers already using Mint to manage Swift packages, this provides a streamlined installation method. Mint handles version management and dependencies automatically.
Comprehensive Usage Guide
Command Structure Fundamentals
yap transcribe [--locale <locale>] [--censor] <input-file> [--txt] [--srt] [--output-file <output-file>]
Parameter Reference Table
Parameter | Short Form | Description | Default Value |
---|---|---|---|
--locale |
-l |
Specifies transcription language | System default |
--censor |
None | Filters sensitive words | Disabled |
--txt |
None | Outputs plain text | Default format |
--srt |
None | Outputs SRT subtitles | |
--output-file |
-o |
Sets output file path | Prints to terminal |
Basic Transcription Examples
Transcribe an MP3 file to text:
yap transcribe interview_recording.mp3
Generate SRT subtitles from a video file:
yap transcribe presentation_video.mp4 --srt -o presentation_subtitles.srt

Advanced Usage Patterns
1. YouTube Video Transcription Workflow
yt-dlp "https://www.youtube.com/watch?v=exampleID" -x --exec yap
This powerful command sequence:
-
Downloads the YouTube video using yt-dlp -
Extracts audio with -x
parameter -
Automatically passes the audio to yap for transcription
2. Automated Video Summarization
yap video.mp4 | uvx llm -m mlx-community/Llama-3.2-1B-Instruct-4bit 'Summarize this transcript:'
This pipeline:
-
Transcribes the video content with yap -
Processes the text through a local Llama language model -
Generates a concise summary of the content
3. Multi-language Transcription
yap transcribe spanish_audio.wav -l es-ES
The -l
parameter allows transcription of various languages by specifying locale codes like fr-FR
(French), de-DE
(German), or ja-JP
(Japanese).
4. Sensitive Content Handling
yap transcribe confidential_meeting.m4a --censor
The --censor
flag automatically redacts sensitive words and phrases, essential for legal or medical transcriptions containing personal information.
Practical Application Scenarios
Academic Research Interviews
Challenge: Researchers often conduct numerous interviews containing sensitive personal data that cannot be uploaded to cloud services.
Solution:
for file in research_interviews/*.m4a; do
yap transcribe "$file" --censor -o "transcripts/$(basename "$file").txt"
done
This script batch processes all interview files, automatically censoring sensitive content while keeping all data on the researcher’s local machine.
Video Content Creation
Challenge: Content creators need efficient ways to generate accurate subtitles for accessibility and SEO.
Solution:
yap transcribe new_vlog.mp4 --srt -o vlog_subtitles.srt
Creators save hours by generating ready-to-use SRT files that can be directly imported into video editing software.
Corporate Meeting Documentation
Challenge: Teams need searchable records of meetings without compromising confidential discussions.
Solution:
yap transcribe quarterly_strategy.m4a | grep "Q3 targets" -i -C 5
This combination creates a searchable transcript and immediately locates relevant discussions about quarterly targets.

Performance Analysis
Extensive testing reveals yap’s performance characteristics:
Audio Type | Duration | Processing Time | Accuracy |
---|---|---|---|
Clear speech (quiet environment) | 10 min | ~2 min | 95%+ |
Multi-person discussion | 10 min | ~3 min | 85-90% |
Background music present | 10 min | ~3.5 min | 80-85% |
Telephone recording | 10 min | ~4 min | 75-80% |
Note: Performance varies based on Mac hardware, with Apple Silicon chips showing significantly faster processing times.
Frequently Asked Questions
What file formats does yap support?
yap works with all audio and video formats supported by macOS, including:
-
Audio: MP3, WAV, M4A, CAF -
Video: MP4, MOV, M4V
How should I handle large audio files?
For files longer than 60 minutes:
-
Split into smaller segments -
Run with nohup yap transcribe large_file.mp3 &
for background processing -
Ensure device is connected to power
How can I improve transcription accuracy?
-
Use high-quality microphones during recording -
Minimize background noise -
Specify locale precisely with -l
parameter -
For technical terms, create pronunciation guides
How does yap compare to cloud transcription services?
Feature | yap | Cloud Services |
---|---|---|
Privacy | ★★★★★ | ★★☆☆☆ |
Offline capability | ★★★★★ | ☆☆☆☆☆ |
Processing speed | ★★★☆☆ | ★★★★☆ |
Accuracy | ★★★★☆ | ★★★★★ |
Cost | Free | Per-minute fees |
Technical Deep Dive: How yap Works
yap serves as an elegant wrapper around Apple’s Speech framework. The technical workflow:
-
File decoding: Uses AVFoundation to extract audio from media files -
Speech recognition: Leverages SFSpeechRecognizer for audio processing -
Result processing: Applies formatting and censorship rules -
Output generation: Creates text or SRT output
SRT Subtitle Generation Mechanics
yap creates precisely timed subtitles:
1
00:00:02,140 --> 00:00:05,620
This is the first subtitle
2
00:00:06,300 --> 00:00:09,450
This is the second subtitle
The accurate time coding makes these files immediately usable in video editing software.

Building Advanced Workflows
Automated Content Processing Pipeline
#!/bin/bash
# Download latest podcast episode
yt-dlp "https://example.com/podcast" -x -o podcast_episode.mp3
# Transcribe to text
yap transcribe podcast_episode.mp3 -o transcript.txt
# Generate summary
uvx llm -m mlx-community/Llama-3.2-1B-Instruct-4bit 'Extract key insights:' < transcript.txt > summary.txt
# Save to knowledge base
curl -X POST https://api.notion.com/v1/pages \
-H "Authorization: Bearer $NOTION_TOKEN" \
-H "Content-Type: application/json" \
-d @- <<EOF
{
"parent": { "database_id": "$DATABASE_ID" },
"properties": {
"Title": {
"title": [{ "text": { "content": "Podcast Summary $(date)" } }]
}
},
"children": [
{
"object": "block",
"type": "paragraph",
"paragraph": {
"text": [{ "type": "text", "text": { "content": "$(cat summary.txt)" } }]
}
}
]
}
EOF
This automated workflow downloads, transcribes, summarizes, and archives podcast content without manual intervention.
The Future of Local Transcription
The yap roadmap indicates exciting developments:
-
Real-time microphone transcription capabilities -
Speaker diarization (identifying different speakers) -
Custom vocabulary support for technical terms -
Enhanced output formatting options
Conclusion: Transforming Audio Processing Workflows
yap fundamentally changes how we approach speech transcription by resolving the tension between convenience and privacy. This isn’t merely a technical demonstration but a practical solution for:
-
Content creators needing efficient subtitle generation -
Researchers handling confidential interviews -
Legal professionals documenting privileged conversations -
Journalists conducting sensitive interviews
By keeping all processing on-device, yap demonstrates that privacy-respecting tools can be both powerful and user-friendly. In an era of increasing data exploitation, tools like yap represent an important shift toward user-controlled computing.
Install yap today:
brew install finnvoor/tools/yap
Explore the project’s full potential on GitHub and transform your audio processing workflows while maintaining complete control over your sensitive content.