Abogen: Convert eBooks to Audiobooks with Perfectly Synced Subtitles

Transform PDFs, ePubs, and text files into narrated audiobooks with chapter markers – no technical expertise needed

Have you ever wanted to convert your eBook collection into professionally narrated audiobooks? Or generate voiceovers with perfectly timed subtitles for your content? Abogen makes this possible with its AI-powered text-to-speech technology. Using the advanced Kokoro-82M speech engine, Abogen converts text to natural-sounding audio while generating synchronized subtitles – all within seconds. Here’s your complete guide to mastering this powerful tool.


What Makes Abogen Special?

Abogen stands out with these key capabilities:

  • Multi-format support: Directly processes ePub, PDF, and text files
  • Subtitle synchronization: Generates frame-accurate subtitles in SRT/ASS formats
  • GPU acceleration: Leverages your graphics card for rapid processing (RTX 2060 processes 3,000 characters in 11 seconds)
  • Chapter preservation: Maintains eBook chapter structure in audio outputs
  • Voice customization: Create unique voice profiles by blending different speech models
  • Batch processing: Queue multiple files with individual settings

See it in action: 5-second demo generating 1-minute audiobook


Installation Guide (All Platforms)

Windows Users

Option 1: One-Click Installer (Recommended)

  1. Download the repository
  2. Extract files and double-click WINDOWS_INSTALL.bat
  3. Install espeak-ng when prompted

The installer automatically configures a self-contained Python environment

Option 2: Manual Installation (Advanced)

python -m venv venv
venv\Scripts\activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install abogen

macOS Installation

brew install espeak-ng
pip3 install abogen

Linux Installation

sudo apt install espeak-ng  # Ubuntu/Debian
pip3 install abogen

# For AMD GPUs:
pip3 uninstall torch
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4

Creating Your First Audiobook

Follow these three steps:

  1. Input your content
    Drag ePub/PDF/TXT files into the window or use the built-in text editor

  2. Configure settings:

    • Speech speed: Adjust from 0.1x to 2.0x
    • Voice selection: Choose from preset voices (e.g., “am” for American male)
    • Subtitle generation: Select granularity (sentence, word, or multi-word)
    • Output format: Audio (MP3/WAV/OPUS/M4B) + Subtitles (SRT/ASS)
    • Save location: Original folder, desktop, or custom directory
  3. Start conversion
    Monitor progress via the status bar and log window

Pro Tip: Add <<CHAPTER_MARKER:Chapter Name>> in text files to create chapter breaks


Advanced Features Explained

1. Voice Mixer: Create Custom Voices

Blend different voice profiles to create unique narrations:

  • Adjust sliders to mix voice characteristics
  • Preview adjustments in real-time
  • Save custom profiles for future projects

Example: 60% American male + 40% British female creates a neutral academic tone

2. Queue Mode: Batch Processing

Process multiple files efficiently:

  1. Add files via main interface or queue manager
  2. Configure individual settings per file
  3. Process entire queue in sequence
  4. Outputs save to specified locations

3. Professional Metadata Tagging

Add these tags at the beginning of text files for enhanced audiobook metadata:

<<METADATA_TITLE:Your Book Title>>
<<METADATA_ARTIST:Author Name>>
<<METADATA_YEAR:2025>>
<<METADATA_GENRE:Fiction>>

Configuration Reference

Core Settings

Option Function
Speech Rate 0.1x to 2.0x speed adjustment
Subtitle Mode Disabled, Sentence-level, Word-level (1-3 words)
Audio Format WAV, FLAC, MP3, OPUS, M4B (chapterized)
Subtitle Format SRT, ASS (narrow/wide/centered)
Newline Handling Replace single line breaks with spaces

eBook Processing

Feature Purpose
Chapter Selection Choose specific chapters from ePubs/PDFs
Per-Chapter Audio Export each chapter as separate file
Merged Output Combine all chapters into single audio file
Metadata Preservation Embed author/title information

Frequently Asked Questions

What languages are supported?

🇺🇸 English: a (American), b (British)
🇯🇵 Japanese: j (requires `pip install misaki[ja]`)
🇨🇳 Chinese: z (requires `pip install misaki[zh]`)
🇪🇸 Spanish: e
🇫🇷 French: f
See full list: [VOICES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md)

Does subtitle generation work for non-English texts?

Currently, only English supports timestamped subtitles due to Kokoro engine limitations. For technical details, see the pipeline implementation.

Can I use AMD graphics cards?

✅ Full support on Linux
⚠️ Limited support on Windows (ROCm driver constraints)

What’s the best player for Abogen outputs?

We recommend MPV player with this configuration:

save-position-on-quit
sub-ass-override=no
sub-margin-y=50
audio-samplerate=48000

Technical Implementation

Abogen leverages these core technologies:

  • Speech Engine: Kokoro-82M (Apache 2.0 licensed)
  • eBook Processing: EbookLib Python library
  • GUI Framework: PyQt
  • Dependency Management: Embedded Python (Windows)

License: MIT (allows commercial use)
Icons provided by Icons8


Practical Use Cases

  • Audiobook Creation: Convert novels/textbooks to narrated audio
  • Accessibility Tools: Generate audio versions of documents
  • Content Creation: Produce subtitled voiceovers for videos
  • Language Learning: Create listening materials with transcriptions

Get started: GitHub Repository
Report issues: Issue Tracker


> Platform Support: Windows 10+, macOS 12+, Linux (Ubuntu/Debian/Fedora/Arch)