Skip to content

psdwizzard/chatterbox-Audiobook

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

67 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎧 Chatterbox Audiobook Generator

This is a work in progress. You can consider this a pre-launch repo at the moment, but if you find bugs, please put them in the issues area. Thank you. Transform your text into high-quality audiobooks with advanced TTS models, voice cloning, and professional volume normalization.

πŸš€ Quick Start

1. Install Dependencies

./install-audiobook.bat

2. Launch the Application

./launch_audiobook.bat

3. CUDA Issue Fix (If Needed)

If you encounter CUDA assertion errors during generation, install the patched version:

# Activate your virtual environment first
venv\Scripts\activate.bat

# Install the CUDA-fixed version
pip install --force-reinstall --no-cache-dir "chatterbox-tts @ git+https://github.com/fakerybakery/better-chatterbox@fix-cuda-issue"

The web interface will open automatically in your browser at http://localhost:7860


✨ Features

πŸ“š Audiobook Creation

  • Single Voice: Generate entire audiobooks with one consistent voice
  • Multi-Voice: Create dynamic audiobooks with multiple characters
  • Custom Voices: Clone voices from audio samples for personalized narration
  • Professional Volume Normalization: Ensure consistent audio levels across all voices
  • πŸ“‹ Text Queuing System ⭐ NEW: Upload books in any size chapters and generate continuously
  • πŸ”„ Chunk-Based Processing ⭐ NEW: Improved reliability for longer text generations

🎡 Audio Processing

  • Smart Cleanup: Remove unwanted silence and audio artifacts
  • Volume Normalization: Professional-grade volume balancing for all voices
  • Real-time Audio Analysis: Live volume level monitoring and feedback
  • Preview System: Test settings before applying to entire projects
  • Batch Processing: Process multiple projects efficiently
  • Quality Control: Advanced audio optimization tools
  • 🎯 Enhanced Audio Quality ⭐ NEW: Improved P-top and minimum P parameters for better voice generation

🎭 Voice Management

  • Voice Library: Organize and manage your voice collection
  • Voice Cloning: Create custom voices from audio samples
  • Volume Settings: Configure target volume levels for each voice
  • Professional Presets: Industry-standard volume levels (audiobook, podcast, broadcast)
  • Character Assignment: Map specific voices to story characters

πŸ“Š Volume Normalization System ⭐ NEW

  • Professional Standards: Audiobook (-18 dB), Podcast (-16 dB), Broadcast (-23 dB) presets
  • Consistent Character Voices: All characters maintain the same volume level
  • Real-time Analysis: Color-coded volume status with RMS and peak level display
  • Retroactive Normalization: Apply volume settings to existing voice projects
  • Multi-Voice Support: Batch normalize all voices in multi-character audiobooks
  • Soft Limiting: Intelligent audio limiting to prevent distortion

πŸ“– Text Processing

  • Chapter Support: Automatic chapter detection and organization
  • Multi-Voice Parsing: Parse character dialogue automatically
  • Text Validation: Ensure proper formatting before generation
  • πŸ“‹ Queue Management ⭐ NEW: Batch process multiple text files sequentially
  • πŸ”‡ Return Pause System ⭐ NEW: Automatic pause insertion based on line breaks for natural speech flow

🎭 Custom Audiobook Processing Pipeline ⭐ NEW

Our advanced text processing pipeline transforms your written content into natural-sounding audiobooks with intelligent pause placement and character flow management.

πŸ”‡ Return Pause System

Automatic pause insertion based on your text formatting - Every line break (\n) in your text automatically adds a 0.1-second pause to the generated audio, creating natural speech rhythms without manual intervention.

How It Works

  • Line Break Detection: System automatically counts all line breaks in your text
  • Pause Calculation: Each return adds exactly 0.1 seconds of silence
  • Accumulative Pauses: Multiple consecutive line breaks create longer pauses
  • Universal Support: Works with single-voice, multi-voice, and batch processing

Example Text Formatting

[Narrator] The sun was setting over the hills.

[Character1] "We need to find shelter soon."

[Character2] "I see a cave up ahead.
Let's hurry before it gets dark."


[Narrator] They rushed toward the cave, hearts pounding.

Result: Natural pauses between dialogue, emphasis pauses for dramatic effect, and smooth character transitions.

πŸ“ Text Formatting Best Practices

🎭 Multi-Voice Dialogue Structure

[Character Name] Dialogue content here.

[Another Character] Response content here.
Multiple lines can be used for the same character.

[Narrator] Descriptive text and scene setting.

πŸŽͺ Natural Flow Techniques

  • Paragraph Breaks: Use double line breaks for scene transitions
  • Emphasis Pauses: Add extra returns before important revelations
  • Character Separation: Single returns between different speakers
  • Breathing Room: Natural pauses for complex concepts or emotional moments

πŸ“– Single Voice Formatting

Chapter content flows naturally here.

New paragraphs create natural pauses.


Extended pauses can emphasize dramatic moments.

Regular text continues with normal pacing.

πŸ”„ Processing Pipeline Features

🧠 Intelligent Text Analysis

  • Line Break Preservation: Maintains your formatting intentions throughout processing
  • Character Assignment: Automatically maps voice tags to selected voice profiles
  • Chunk Optimization: Breaks long texts into optimal segments while preserving pause timing
  • Error Recovery: Validates text and provides helpful formatting suggestions

⚑ Real-Time Processing

  • Live Feedback: Console output shows exactly how many pauses are being added
  • Debug Information: Detailed logging of pause detection and application
  • Progress Tracking: Monitor pause processing alongside audio generation
  • Quality Assurance: Automatic validation of pause placement

🎚️ Professional Output

  • Seamless Integration: Pauses blend naturally with generated speech
  • Volume Consistency: Silence segments match the audio output specifications
  • Format Compatibility: Works with all supported audio formats and quality settings
  • Project Preservation: Pause information saved in project metadata for regeneration

πŸ’‘ Pro Tips for Better Audiobooks

🎯 Dialogue Formatting

  • Character Consistency: Always use the same character name format [Name]
  • Natural Breaks: Place returns where a human reader would naturally pause
  • Scene Transitions: Use multiple returns (2-3) for major scene changes
  • Emotional Beats: Add single returns before/after emotional dialogue

πŸ“š Chapter Structure

Chapter 1: The Beginning

Opening paragraph with scene setting.

"Character dialogue with natural flow."

Descriptive narrative continues.


Major scene transition with extended pause.

New section begins here.

πŸŽͺ Advanced Techniques

  • Cliffhangers: Use extended pauses before revealing crucial information
  • Action Sequences: Shorter, punchy sentences with minimal pauses for intensity
  • Contemplative Moments: Longer pauses for reflection and character development
  • Comedic Timing: Strategic pauses before punchlines or comedic reveals

πŸ” Debug Output Examples

When generating your audiobook, watch for these helpful console messages:

πŸ”‡ Detected 15 line breaks β†’ 1.5s total pause time
πŸ”‡ Line breaks detected in [Character1]: +0.3s pause (from 3 returns)
πŸ”‡ Chunk 2 (Narrator): Added 0.2s pause after speech

This real-time feedback helps you understand exactly how your formatting translates to audio timing.


πŸ†• Recent Improvements

🎯 Audio Quality Enhancements

We've significantly improved audio generation quality by optimizing the underlying TTS parameters:

  • Enhanced P-top and Minimum P Settings: Fine-tuned probability parameters for more natural speech patterns
  • Reduced Audio Artifacts: Better handling of pronunciation and intonation
  • Improved Voice Consistency: More stable voice characteristics across long generations
  • Better Pronunciation: Enhanced handling of complex words and names

πŸ“ Note for Existing Users:

  • Older voice profiles will continue to work as before
  • To take advantage of the new audio quality improvements, consider re-creating voice profiles
  • Existing projects remain fully compatible

πŸ“‹ Text Queuing System

Perfect for processing large books or multiple chapters:

  • Batch Upload: Upload multiple text files of any size
  • Sequential Processing: Automatically processes files one after another
  • Progress Tracking: Monitor generation progress across all queued items
  • Flexible Chapter Sizes: No restrictions on individual file length
  • Unattended Generation: Set up large projects and let them run automatically

πŸ”„ Chunk-Based TTS System

Enhanced the core text-to-speech engine for better reliability:

  • Background Chunking: Automatically splits long texts into optimal chunks
  • Memory Management: Better handling of large text inputs
  • Error Recovery: Improved resilience during long generation sessions
  • Consistent Quality: Maintains voice quality across chunk boundaries
  • Progress Feedback: Real-time updates on generation progress

🎚️ Volume Normalization Guide

Individual Voice Setup

  1. Go to Voice Library tab
  2. Upload your voice sample and configure settings
  3. Set target volume level (default: -18 dB for audiobooks)
  4. Choose from professional presets or use custom levels
  5. Save voice profile with volume settings

Multi-Voice Projects

  1. Navigate to Multi-Voice Audiobook Creation tab
  2. Enable volume normalization for all voices
  3. Set target level for consistent character voices
  4. All characters will be automatically normalized during generation

Text Queuing Workflow ⭐ NEW

  1. Go to Production Studio tab
  2. Select "Batch Processing" mode
  3. Upload multiple text files (chapters, sections, etc.)
  4. Choose your voice and settings
  5. Start batch processing - files will generate sequentially
  6. Monitor progress and download completed audiobooks

Professional Standards

  • πŸ“– Audiobook Standard: -18 dB RMS (recommended for most audiobooks)
  • πŸŽ™οΈ Podcast Standard: -16 dB RMS (for podcast-style content)
  • πŸ”‡ Quiet/Comfortable: -20 dB RMS (for quiet listening environments)
  • πŸ”Š Loud/Energetic: -14 dB RMS (for dynamic, energetic content)
  • πŸ“Ί Broadcast Standard: -23 dB RMS (for broadcast television standards)

πŸ“ Project Structure

πŸ“¦ Your Audiobook Projects
β”œβ”€β”€ 🎀 speakers/           # Voice library and samples
β”œβ”€β”€ πŸ“š audiobook_projects/ # Generated audiobooks
β”œβ”€β”€ πŸ”§ src/audiobook/      # Core processing modules
└── πŸ“„ Generated files...  # Audio chunks and final outputs

🎯 Workflow

  1. πŸ“ Prepare Text: Format your story with proper chapter breaks and strategic line breaks for natural pauses
  2. 🎀 Select Voices: Choose or clone voices for your characters
  3. 🎚️ Configure Volume: Set professional volume levels and normalization
  4. βš™οΈ Configure Settings: Adjust quality, speed, and processing options
  5. 🎧 Generate Audio: Create your audiobook with advanced TTS and automatic pause insertion
  6. 🧹 Clean & Optimize: Use smart cleanup tools for perfect audio
  7. πŸ“¦ Export: Get your finished audiobook ready for distribution

🎭 Enhanced Multi-Voice Workflow

  1. πŸ“ Format Dialogue: Use [Character] tags and strategic line breaks for natural flow
  2. πŸ”‡ Add Return Pauses: Place line breaks where you want natural speech pauses (0.1s each)
  3. 🎀 Assign Voices: Map each character to their voice profile
  4. ⚑ Process with Intelligence: Watch console output for pause detection feedback
  5. 🎧 Review & Adjust: Listen to generated audio and refine formatting if needed

πŸ“‹ Batch Processing Workflow ⭐ NEW

  1. πŸ“š Organize Chapters: Split your book into individual text files
  2. πŸ“‹ Queue Setup: Upload all files to the batch processing system
  3. 🎀 Voice Selection: Choose voice and configure settings once
  4. πŸ”„ Automated Generation: Let the system process all files sequentially
  5. πŸ“Š Monitor Progress: Track completion status in real-time
  6. πŸ“¦ Collect Results: Download all generated audiobook chapters

πŸ› οΈ Technical Requirements

  • Python 3.8+
  • CUDA GPU (recommended for faster processing)
  • 8GB+ RAM (16GB recommended for large projects)
  • Modern web browser for the interface

πŸ”§ CUDA Support

  • CUDA compatibility issues have been resolved with updated dependencies
  • GPU acceleration is now stable for extended generation sessions
  • Fallback to CPU processing available if CUDA issues occur
  • If you encounter CUDA assertion errors: Use the patched version from the installation instructions above
  • The fix addresses PyTorch indexing issues that could cause crashes during audio generation

⚠️ Known Issues & Compatibility

Multi-Voice Generation

  • Short sentences or sections may occasionally cause issues during multi-voice generation
  • This is a limitation of the underlying TTS models rather than the implementation
  • Workaround: Use longer, more detailed sentences for better stability
  • Single-voice generation is not affected by this issue

Voice Profile Compatibility

  • Existing Voices: All older voice profiles remain fully functional
  • New Features: To benefit from improved audio quality, consider re-creating voice profiles
  • Project Compatibility: Existing audiobook projects work without modification
  • Regeneration: Individual chunks can be regenerated with improved quality settings

Batch Processing Considerations

  • Large batch jobs may take significant time depending on text length and hardware
  • Monitor system resources during extended batch processing sessions
  • Consider processing very large books in smaller batches for better control

πŸ“‹ Supported Formats

Input

  • Text: .txt, .md, formatted stories and scripts
  • Audio Samples: .wav, .mp3, .flac for voice cloning
  • Batch Files: Multiple text files for queue processing

Output

  • Audio: High-quality .wav files with professional volume levels
  • Projects: Organized folder structure with chapters
  • Exports: Ready-to-use audiobook files
  • Batch Results: Multiple completed audiobooks from queue processing

πŸ†˜ Support

  • Features Guide: See AUDIOBOOK_FEATURES.md for detailed capabilities
  • Development Notes: Check development/ folder for technical details
  • Issues: Report problems via GitHub issues

πŸ“„ License

This project is licensed under the terms specified in LICENSE.


πŸŽ‰ Ready to create amazing audiobooks with professional volume levels and enhanced audio quality? Run ./launch_audiobook.bat and start generating!

About

SoTA open-source TTS for Audiobook and Podcast Generation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.6%
  • Batchfile 1.4%