Abogen: Convert eBooks to Audiobooks with Perfectly Synced Subtitles
Transform PDFs, ePubs, and text files into narrated audiobooks with chapter markers – no technical expertise needed
Have you ever wanted to convert your eBook collection into professionally narrated audiobooks? Or generate voiceovers with perfectly timed subtitles for your content? Abogen makes this possible with its AI-powered text-to-speech technology. Using the advanced Kokoro-82M speech engine, Abogen converts text to natural-sounding audio while generating synchronized subtitles – all within seconds. Here’s your complete guide to mastering this powerful tool.
What Makes Abogen Special?
Abogen stands out with these key capabilities:
-
Multi-format support: Directly processes ePub, PDF, and text files -
Subtitle synchronization: Generates frame-accurate subtitles in SRT/ASS formats -
GPU acceleration: Leverages your graphics card for rapid processing (RTX 2060 processes 3,000 characters in 11 seconds) -
Chapter preservation: Maintains eBook chapter structure in audio outputs -
Voice customization: Create unique voice profiles by blending different speech models -
Batch processing: Queue multiple files with individual settings
See it in action: 5-second demo generating 1-minute audiobook
Installation Guide (All Platforms)
Windows Users
Option 1: One-Click Installer (Recommended)
-
Download the repository -
Extract files and double-click WINDOWS_INSTALL.bat
-
Install espeak-ng when prompted
The installer automatically configures a self-contained Python environment
Option 2: Manual Installation (Advanced)
python -m venv venv
venv\Scripts\activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install abogen
macOS Installation
brew install espeak-ng
pip3 install abogen
Linux Installation
sudo apt install espeak-ng # Ubuntu/Debian
pip3 install abogen
# For AMD GPUs:
pip3 uninstall torch
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4
Creating Your First Audiobook
Follow these three steps:
-
Input your content
Drag ePub/PDF/TXT files into the window or use the built-in text editor -
Configure settings:
-
Speech speed: Adjust from 0.1x to 2.0x -
Voice selection: Choose from preset voices (e.g., “am” for American male) -
Subtitle generation: Select granularity (sentence, word, or multi-word) -
Output format: Audio (MP3/WAV/OPUS/M4B) + Subtitles (SRT/ASS) -
Save location: Original folder, desktop, or custom directory
-
-
Start conversion
Monitor progress via the status bar and log window
Pro Tip: Add
<<CHAPTER_MARKER:Chapter Name>>
in text files to create chapter breaks
Advanced Features Explained
1. Voice Mixer: Create Custom Voices
Blend different voice profiles to create unique narrations:
-
Adjust sliders to mix voice characteristics -
Preview adjustments in real-time -
Save custom profiles for future projects
Example: 60% American male + 40% British female creates a neutral academic tone
2. Queue Mode: Batch Processing
Process multiple files efficiently:
-
Add files via main interface or queue manager -
Configure individual settings per file -
Process entire queue in sequence -
Outputs save to specified locations
3. Professional Metadata Tagging
Add these tags at the beginning of text files for enhanced audiobook metadata:
<<METADATA_TITLE:Your Book Title>>
<<METADATA_ARTIST:Author Name>>
<<METADATA_YEAR:2025>>
<<METADATA_GENRE:Fiction>>
Configuration Reference
Core Settings
Option | Function |
---|---|
Speech Rate | 0.1x to 2.0x speed adjustment |
Subtitle Mode | Disabled, Sentence-level, Word-level (1-3 words) |
Audio Format | WAV, FLAC, MP3, OPUS, M4B (chapterized) |
Subtitle Format | SRT, ASS (narrow/wide/centered) |
Newline Handling | Replace single line breaks with spaces |
eBook Processing
Feature | Purpose |
---|---|
Chapter Selection | Choose specific chapters from ePubs/PDFs |
Per-Chapter Audio | Export each chapter as separate file |
Merged Output | Combine all chapters into single audio file |
Metadata Preservation | Embed author/title information |
Frequently Asked Questions
What languages are supported?
🇺🇸 English: a (American), b (British)
🇯🇵 Japanese: j (requires `pip install misaki[ja]`)
🇨🇳 Chinese: z (requires `pip install misaki[zh]`)
🇪🇸 Spanish: e
🇫🇷 French: f
See full list: [VOICES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md)
Does subtitle generation work for non-English texts?
Currently, only English supports timestamped subtitles due to Kokoro engine limitations. For technical details, see the pipeline implementation.
Can I use AMD graphics cards?
✅ Full support on Linux
⚠️ Limited support on Windows (ROCm driver constraints)
What’s the best player for Abogen outputs?
We recommend MPV player with this configuration:
save-position-on-quit
sub-ass-override=no
sub-margin-y=50
audio-samplerate=48000
Technical Implementation
Abogen leverages these core technologies:
-
Speech Engine: Kokoro-82M (Apache 2.0 licensed) -
eBook Processing: EbookLib Python library -
GUI Framework: PyQt -
Dependency Management: Embedded Python (Windows)
License: MIT (allows commercial use)
Icons provided by Icons8
Practical Use Cases
-
Audiobook Creation: Convert novels/textbooks to narrated audio -
Accessibility Tools: Generate audio versions of documents -
Content Creation: Produce subtitled voiceovers for videos -
Language Learning: Create listening materials with transcriptions
Get started: GitHub Repository
Report issues: Issue Tracker
> Platform Support: Windows 10+, macOS 12+, Linux (Ubuntu/Debian/Fedora/Arch)