This is a work in progress. You can consider this a pre-launch repo at the moment, but if you find bugs, please put them in the issues area. Thank you. Transform your text into high-quality audiobooks with advanced TTS models, voice cloning, and professional volume normalization.
./install-audiobook.bat./launch_audiobook.batIf you encounter CUDA assertion errors during generation, install the patched version:
# Activate your virtual environment first
venv\Scripts\activate.bat
# Install the CUDA-fixed version
pip install --force-reinstall --no-cache-dir "chatterbox-tts @ git+https://github.com/fakerybakery/better-chatterbox@fix-cuda-issue"The web interface will open automatically in your browser at http://localhost:7860
- Single Voice: Generate entire audiobooks with one consistent voice
- Multi-Voice: Create dynamic audiobooks with multiple characters
- Custom Voices: Clone voices from audio samples for personalized narration
- Professional Volume Normalization: Ensure consistent audio levels across all voices
- π Text Queuing System β NEW: Upload books in any size chapters and generate continuously
- π Chunk-Based Processing β NEW: Improved reliability for longer text generations
- Smart Cleanup: Remove unwanted silence and audio artifacts
- Volume Normalization: Professional-grade volume balancing for all voices
- Real-time Audio Analysis: Live volume level monitoring and feedback
- Preview System: Test settings before applying to entire projects
- Batch Processing: Process multiple projects efficiently
- Quality Control: Advanced audio optimization tools
- π― Enhanced Audio Quality β NEW: Improved P-top and minimum P parameters for better voice generation
- Voice Library: Organize and manage your voice collection
- Voice Cloning: Create custom voices from audio samples
- Volume Settings: Configure target volume levels for each voice
- Professional Presets: Industry-standard volume levels (audiobook, podcast, broadcast)
- Character Assignment: Map specific voices to story characters
- Professional Standards: Audiobook (-18 dB), Podcast (-16 dB), Broadcast (-23 dB) presets
- Consistent Character Voices: All characters maintain the same volume level
- Real-time Analysis: Color-coded volume status with RMS and peak level display
- Retroactive Normalization: Apply volume settings to existing voice projects
- Multi-Voice Support: Batch normalize all voices in multi-character audiobooks
- Soft Limiting: Intelligent audio limiting to prevent distortion
- Chapter Support: Automatic chapter detection and organization
- Multi-Voice Parsing: Parse character dialogue automatically
- Text Validation: Ensure proper formatting before generation
- π Queue Management β NEW: Batch process multiple text files sequentially
- π Return Pause System β NEW: Automatic pause insertion based on line breaks for natural speech flow
Our advanced text processing pipeline transforms your written content into natural-sounding audiobooks with intelligent pause placement and character flow management.
Automatic pause insertion based on your text formatting - Every line break (\n) in your text automatically adds a 0.1-second pause to the generated audio, creating natural speech rhythms without manual intervention.
- Line Break Detection: System automatically counts all line breaks in your text
- Pause Calculation: Each return adds exactly 0.1 seconds of silence
- Accumulative Pauses: Multiple consecutive line breaks create longer pauses
- Universal Support: Works with single-voice, multi-voice, and batch processing
[Narrator] The sun was setting over the hills.
[Character1] "We need to find shelter soon."
[Character2] "I see a cave up ahead.
Let's hurry before it gets dark."
[Narrator] They rushed toward the cave, hearts pounding.
Result: Natural pauses between dialogue, emphasis pauses for dramatic effect, and smooth character transitions.
[Character Name] Dialogue content here.
[Another Character] Response content here.
Multiple lines can be used for the same character.
[Narrator] Descriptive text and scene setting.
- Paragraph Breaks: Use double line breaks for scene transitions
- Emphasis Pauses: Add extra returns before important revelations
- Character Separation: Single returns between different speakers
- Breathing Room: Natural pauses for complex concepts or emotional moments
Chapter content flows naturally here.
New paragraphs create natural pauses.
Extended pauses can emphasize dramatic moments.
Regular text continues with normal pacing.
- Line Break Preservation: Maintains your formatting intentions throughout processing
- Character Assignment: Automatically maps voice tags to selected voice profiles
- Chunk Optimization: Breaks long texts into optimal segments while preserving pause timing
- Error Recovery: Validates text and provides helpful formatting suggestions
- Live Feedback: Console output shows exactly how many pauses are being added
- Debug Information: Detailed logging of pause detection and application
- Progress Tracking: Monitor pause processing alongside audio generation
- Quality Assurance: Automatic validation of pause placement
- Seamless Integration: Pauses blend naturally with generated speech
- Volume Consistency: Silence segments match the audio output specifications
- Format Compatibility: Works with all supported audio formats and quality settings
- Project Preservation: Pause information saved in project metadata for regeneration
- Character Consistency: Always use the same character name format
[Name] - Natural Breaks: Place returns where a human reader would naturally pause
- Scene Transitions: Use multiple returns (2-3) for major scene changes
- Emotional Beats: Add single returns before/after emotional dialogue
Chapter 1: The Beginning
Opening paragraph with scene setting.
"Character dialogue with natural flow."
Descriptive narrative continues.
Major scene transition with extended pause.
New section begins here.
- Cliffhangers: Use extended pauses before revealing crucial information
- Action Sequences: Shorter, punchy sentences with minimal pauses for intensity
- Contemplative Moments: Longer pauses for reflection and character development
- Comedic Timing: Strategic pauses before punchlines or comedic reveals
When generating your audiobook, watch for these helpful console messages:
π Detected 15 line breaks β 1.5s total pause time
π Line breaks detected in [Character1]: +0.3s pause (from 3 returns)
π Chunk 2 (Narrator): Added 0.2s pause after speech
This real-time feedback helps you understand exactly how your formatting translates to audio timing.
We've significantly improved audio generation quality by optimizing the underlying TTS parameters:
- Enhanced P-top and Minimum P Settings: Fine-tuned probability parameters for more natural speech patterns
- Reduced Audio Artifacts: Better handling of pronunciation and intonation
- Improved Voice Consistency: More stable voice characteristics across long generations
- Better Pronunciation: Enhanced handling of complex words and names
π Note for Existing Users:
- Older voice profiles will continue to work as before
- To take advantage of the new audio quality improvements, consider re-creating voice profiles
- Existing projects remain fully compatible
Perfect for processing large books or multiple chapters:
- Batch Upload: Upload multiple text files of any size
- Sequential Processing: Automatically processes files one after another
- Progress Tracking: Monitor generation progress across all queued items
- Flexible Chapter Sizes: No restrictions on individual file length
- Unattended Generation: Set up large projects and let them run automatically
Enhanced the core text-to-speech engine for better reliability:
- Background Chunking: Automatically splits long texts into optimal chunks
- Memory Management: Better handling of large text inputs
- Error Recovery: Improved resilience during long generation sessions
- Consistent Quality: Maintains voice quality across chunk boundaries
- Progress Feedback: Real-time updates on generation progress
- Go to Voice Library tab
- Upload your voice sample and configure settings
- Set target volume level (default: -18 dB for audiobooks)
- Choose from professional presets or use custom levels
- Save voice profile with volume settings
- Navigate to Multi-Voice Audiobook Creation tab
- Enable volume normalization for all voices
- Set target level for consistent character voices
- All characters will be automatically normalized during generation
- Go to Production Studio tab
- Select "Batch Processing" mode
- Upload multiple text files (chapters, sections, etc.)
- Choose your voice and settings
- Start batch processing - files will generate sequentially
- Monitor progress and download completed audiobooks
- π Audiobook Standard: -18 dB RMS (recommended for most audiobooks)
- ποΈ Podcast Standard: -16 dB RMS (for podcast-style content)
- π Quiet/Comfortable: -20 dB RMS (for quiet listening environments)
- π Loud/Energetic: -14 dB RMS (for dynamic, energetic content)
- πΊ Broadcast Standard: -23 dB RMS (for broadcast television standards)
π¦ Your Audiobook Projects
βββ π€ speakers/ # Voice library and samples
βββ π audiobook_projects/ # Generated audiobooks
βββ π§ src/audiobook/ # Core processing modules
βββ π Generated files... # Audio chunks and final outputs
- π Prepare Text: Format your story with proper chapter breaks and strategic line breaks for natural pauses
- π€ Select Voices: Choose or clone voices for your characters
- ποΈ Configure Volume: Set professional volume levels and normalization
- βοΈ Configure Settings: Adjust quality, speed, and processing options
- π§ Generate Audio: Create your audiobook with advanced TTS and automatic pause insertion
- π§Ή Clean & Optimize: Use smart cleanup tools for perfect audio
- π¦ Export: Get your finished audiobook ready for distribution
- π Format Dialogue: Use
[Character]tags and strategic line breaks for natural flow - π Add Return Pauses: Place line breaks where you want natural speech pauses (0.1s each)
- π€ Assign Voices: Map each character to their voice profile
- β‘ Process with Intelligence: Watch console output for pause detection feedback
- π§ Review & Adjust: Listen to generated audio and refine formatting if needed
- π Organize Chapters: Split your book into individual text files
- π Queue Setup: Upload all files to the batch processing system
- π€ Voice Selection: Choose voice and configure settings once
- π Automated Generation: Let the system process all files sequentially
- π Monitor Progress: Track completion status in real-time
- π¦ Collect Results: Download all generated audiobook chapters
- Python 3.8+
- CUDA GPU (recommended for faster processing)
- 8GB+ RAM (16GB recommended for large projects)
- Modern web browser for the interface
- CUDA compatibility issues have been resolved with updated dependencies
- GPU acceleration is now stable for extended generation sessions
- Fallback to CPU processing available if CUDA issues occur
- If you encounter CUDA assertion errors: Use the patched version from the installation instructions above
- The fix addresses PyTorch indexing issues that could cause crashes during audio generation
- Short sentences or sections may occasionally cause issues during multi-voice generation
- This is a limitation of the underlying TTS models rather than the implementation
- Workaround: Use longer, more detailed sentences for better stability
- Single-voice generation is not affected by this issue
- Existing Voices: All older voice profiles remain fully functional
- New Features: To benefit from improved audio quality, consider re-creating voice profiles
- Project Compatibility: Existing audiobook projects work without modification
- Regeneration: Individual chunks can be regenerated with improved quality settings
- Large batch jobs may take significant time depending on text length and hardware
- Monitor system resources during extended batch processing sessions
- Consider processing very large books in smaller batches for better control
- Text:
.txt,.md, formatted stories and scripts - Audio Samples:
.wav,.mp3,.flacfor voice cloning - Batch Files: Multiple text files for queue processing
- Audio: High-quality
.wavfiles with professional volume levels - Projects: Organized folder structure with chapters
- Exports: Ready-to-use audiobook files
- Batch Results: Multiple completed audiobooks from queue processing
- Features Guide: See
AUDIOBOOK_FEATURES.mdfor detailed capabilities - Development Notes: Check
development/folder for technical details - Issues: Report problems via GitHub issues
This project is licensed under the terms specified in LICENSE.
π Ready to create amazing audiobooks with professional volume levels and enhanced audio quality? Run ./launch_audiobook.bat and start generating!