A single-binary CLI for local text-to-speech.
This project wraps Resemble AI's Chatterbox ONNX models in a standalone executable. The neural network inference is handled by ONNX Runtime; the Rust code provides a CLI interface, model management, and voice caching.
Based on Resemble AI's Chatterbox, a state-of-the-art open-source TTS model.
cbx offers a zero-dependency way to use Chatterbox:
- Single binary - Download one file, run it. No Python, no virtual environments, no pip.
- Built-in model management - Commands to download, list, and clean up model files.
- Voice profile caching - Encode a reference voice once, reuse it without re-processing.
- Cross-platform - Same CLI on macOS, Linux, and Windows.
The official Chatterbox repository has more features (multilingual, GPU acceleration, fine-tuning). Use cbx when you want a simple, portable tool for basic TTS.
Listen to what Chatterbox can produce:
- Official Chatterbox Turbo demos - samples from Resemble AI
- Original Chatterbox demos - includes emotion exaggeration examples
cbx uses the same underlying model, so output quality is identical.
macOS / Linux:
curl -fsSL https://raw.githubusercontent.com/srv1n/cbx/main/packaging/scripts/install.sh | bashWindows (PowerShell):
irm https://raw.githubusercontent.com/srv1n/cbx/main/packaging/scripts/install.ps1 | iexThe installer downloads the appropriate binary for your platform from GitHub Releases. Models are downloaded separately on first use.
- Homebrew: See Homebrew installation
- Manual: Download from GitHub Releases and place on your PATH
- From source:
cargo build --release
cbx speak --text "Hello from cbx." --voice-wav ./your-voice.wav --out-wav ./output.wavThis will:
- Download the required model files (first run only, ~1-2 GB depending on variant)
- Encode your reference voice
- Generate speech and save to
output.wav
If you don't have a reference WAV file, install the pre-packaged default voice:
macOS / Linux:
curl -fsSL https://raw.githubusercontent.com/srv1n/cbx/main/packaging/scripts/install_default_voice.sh | bashWindows:
irm https://raw.githubusercontent.com/srv1n/cbx/main/packaging/scripts/install_default_voice.ps1 | iexThen generate speech without specifying a voice:
cbx speak --text "Hello from cbx." --out-wav ./output.wavVoice encoding takes a few seconds. Cache it once to skip this step on future runs:
cbx voice add --name myvoice --voice-wav ./your-voice.wav
cbx speak --voice myvoice --text "Much faster now." --out-wav ./output.wav| Command | Description |
|---|---|
cbx speak |
Generate speech from text |
cbx download |
Pre-download model files |
cbx sizes |
Show download sizes for each model variant |
cbx models |
List cached models |
cbx voice add |
Create a cached voice profile |
cbx voice list |
List cached voice profiles |
cbx voice remove |
Delete a voice profile |
cbx clean |
Remove cached models |
Run cbx --help or cbx <command> --help for details.
Chatterbox is published in several ONNX variants. Use --dtype to select:
| Variant | Notes |
|---|---|
fp16 (default) |
Good balance of size and speed |
fp32 |
Largest, most compatible |
quantized, q4, q4f16 |
Smaller downloads, speed varies by platform |
Check available sizes without downloading:
cbx sizesDownload a specific variant:
cbx download --dtype fp16cbx supports two ways to use reference voices:
Direct path (slower): Pass --voice-wav each time. The voice is re-encoded on every run.
cbx speak --voice-wav ./voice.wav --text "Hello" --out-wav ./out.wavCached profile (faster): Encode once, reuse many times.
cbx voice add --name myvoice --voice-wav ./voice.wav
cbx speak --voice myvoice --text "Hello" --out-wav ./out.wavVoice profiles are tied to the model variant (--dtype). If you switch variants, create a new profile for that variant.
- Format: WAV
- Channels: Mono or stereo (converted internally)
- Duration: 5-20 seconds of clear speech works well
- Quality: Clean recording without background noise
cbx includes platform-specific acceleration (CoreML on macOS, DirectML on Windows, CUDA on Linux), but for this particular model, CPU execution is often faster due to graph partitioning overhead in the neural accelerators.
The default is --ep cpu, which should work well on most systems.
cbx automatically uses all available CPU cores. Override with:
cbx --intra-threads 4 speak --text "Hello" --out-wav out.wavOn Apple Silicon, 4 threads often outperforms higher counts due to contention. Experiment to find what works best for your hardware.
| Configuration | Average Time |
|---|---|
| CPU, 4 threads | 22.7s |
| CPU, 2 threads | 29.0s |
| CPU, 8 threads | 33.1s |
| CoreML, 4 threads | 49.7s |
Text: "The quick brown fox jumps over the lazy dog. This is a benchmark run for cbx."
cbx models # Show cached model files
cbx models --long # Show detailed info including commit hashes
cbx downloads # Show all downloaded variants
cbx voice list # Show cached voice profilesPreview what would be deleted:
cbx clean --dry-runDelete specific variants:
cbx clean --dtype fp32 --yesDelete everything (models and voices):
cbx clean --all --voices --yesmacOS / Linux:
curl -fsSL https://raw.githubusercontent.com/srv1n/cbx/main/packaging/scripts/uninstall.sh | bashWindows:
irm https://raw.githubusercontent.com/srv1n/cbx/main/packaging/scripts/uninstall.ps1 | iexThis removes the binary. To also remove cached models and voices:
cbx clean --all --voices --yesA Homebrew formula template is available at packaging/homebrew/cbx.rb.template. Once configured in a tap:
brew tap srv1n/tap
brew install cbxcargo build --release
./target/release/cbx --helpEnable platform-specific acceleration:
cargo build --release --features coreml # macOS
cargo build --release --features directml # Windows
cargo build --release --features cuda # Linux with NVIDIA GPUThis project would not exist without:
- Resemble AI for creating and open-sourcing Chatterbox
- The Chatterbox Turbo ONNX export on Hugging Face
- The ort crate for Rust ONNX Runtime bindings
Dual-licensed under Apache License 2.0 or MIT, at your option.
See LICENSE-APACHE and LICENSE-MIT.