Skip to content

Commit 47a7609

Browse files
authored
feat: add lazy daemon mode and streaming voxtream-server integration (#36)
- Add `vox daemon start/stop/status` — local HTTP server keeps models warm - Daemon manages voxtream-server as child process (WebSocket proxy) - Transparent routing: heavy backends auto-route through daemon if running - Idle auto-shutdown (default 5min, configurable) - Update README with real benchmark data (streaming: 170ms first-frame CUDA) - Add .cargo/config.toml with Metal aliases for macOS dev builds - Add tokio rt-multi-thread, macros, signal, time features Benchmark results (voxtream streaming, model warm): M2 Pro CPU: 3.3s first audio RTX 4070 Ti SUPER: 170ms first audio (paper: 74ms) Closes #30
1 parent bcf0e25 commit 47a7609

8 files changed

Lines changed: 762 additions & 13 deletions

File tree

.cargo/config.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
[alias]
2+
b = "build --features metal"
3+
r = "run --features metal"
4+
t = "test --features metal"

Cargo.lock

Lines changed: 13 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ anyhow = "1"
1919
rusqlite = { version = "0.32", features = ["bundled"] }
2020
dirs = "6"
2121
reqwest = { version = "0.12", features = ["blocking", "json", "stream"] }
22-
tokio = { version = "1", features = ["rt", "net", "io-util"] }
22+
tokio = { version = "1", features = ["rt", "rt-multi-thread", "net", "io-util", "macros", "signal", "time"] }
2323
futures-util = "0.3"
2424
serde = { version = "1", features = ["derive"] }
2525
serde_json = "1"

README.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,19 +27,25 @@ Cross-platform TTS CLI with five backends and MCP server for AI assistants.
2727

2828
### Benchmark — single sentence (~50 chars)
2929

30-
Real-world measurements. Cold start = first run (includes model loading). Warm = model cached on disk.
30+
All times measured end-to-end (model loading + inference + audio playback). Cold = first CLI call.
3131

32-
| Backend | M2 Pro (CPU) | RTX 4070 Ti SUPER (CUDA) | Voice cloning | Quality |
32+
| Backend | M2 Pro (CPU) | RTX 4070 Ti SUPER | Voice cloning | Quality |
3333
|---------|-------------:|-------------------------:|:---:|---------|
3434
| **`say`** | **3s** | macOS only | No | System voices |
3535
| **`kokoro`** | **10s** | ~10s | No | Good |
36-
| **`voxtream`** | **68s** / 8s warm | **44s** / **22s** warm | Yes (zero-shot) | Excellent |
37-
| **`qwen-native`** | **11m33s** / 3s warm | ~30s / ~2s warm | Yes | Excellent |
38-
| **`qwen`** | ~15s / 2s warm | macOS only | Yes | Excellent |
36+
| **`voxtream`** (VoXtream2, 0.5B) | **68s** / 40s warm | **23s** / **19s** warm | Yes (zero-shot) | Excellent |
37+
| **`qwen-native`** (Qwen3-TTS, 0.6B) | **11m33s** / 3s warm | **48s** (CPU) | Yes | Excellent |
38+
| **`qwen`** (MLX-Audio) | ~15s / 2s warm | macOS only | Yes | Excellent |
3939

40-
> `voxtream` cold start includes model download (~500MB) on first run. Subsequent "warm" runs reuse cached model.
41-
> `qwen-native` benefits massively from `--features metal` (macOS) or `--features cuda` (Linux).
42-
> For lowest latency: `say` (macOS) or `kokoro` (all platforms). For best quality + cloning: `voxtream` on GPU.
40+
**With daemon** (`vox daemon start` — keeps model server warm):
41+
42+
| Backend | M2 Pro (CPU) | Notes |
43+
|---------|-------------:|-------|
44+
| **`voxtream`** | **32s** | Inference CPU-bound (~25s). On CUDA: paper reports 74ms first-packet |
45+
| **`qwen-native`** | **~3s** | Model stays in RAM via global Mutex |
46+
47+
> All CUDA benchmarks measured on RTX 4070 Ti SUPER (16GB). qwen-native CUDA not yet supported (requires cudarc update for CUDA 13.2).
48+
> For lowest latency: `say` (macOS) or `kokoro`. For best quality + cloning: `voxtream` on CUDA with daemon.
4349
4450
## Install
4551

src/backend/voxtream.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ use crate::config;
1616
/// Default prompt audio for voxtream when no voice clone is provided.
1717
/// Generated on first use via macOS `say` or a bundled fallback.
1818
/// Stored in /tmp to avoid paths with spaces (torchaudio PosixPath bug).
19-
fn default_prompt_audio() -> Result<PathBuf> {
19+
pub fn default_prompt_audio() -> Result<PathBuf> {
2020
let path = PathBuf::from("/tmp/vox_voxtream_default_prompt.wav");
2121
if path.exists() {
2222
return Ok(path);
@@ -53,7 +53,7 @@ fn default_prompt_audio() -> Result<PathBuf> {
5353
pub struct VoxtreamBackend;
5454

5555
/// Find the voxtream binary — check PATH first, then common venv locations.
56-
fn find_voxtream() -> Option<PathBuf> {
56+
pub fn find_voxtream() -> Option<PathBuf> {
5757
// Check PATH
5858
if let Ok(status) = Command::new("voxtream")
5959
.arg("--help")

0 commit comments

Comments
 (0)