Skip to content

feat: lazy daemon mode with voxtream-server streaming#36

Merged
pszymkowiak merged 1 commit intomainfrom
feat/daemon-voxtream-tui
Mar 18, 2026
Merged

feat: lazy daemon mode with voxtream-server streaming#36
pszymkowiak merged 1 commit intomainfrom
feat/daemon-voxtream-tui

Conversation

@pszymkowiak
Copy link
Contributor

Summary

  • Add vox daemon start/stop/status — keeps heavy TTS models warm in memory
  • Daemon manages voxtream-server (FastAPI) as child process, proxies via WebSocket
  • Transparent routing: vox -b voxtream "text" auto-routes through daemon if running
  • Auto-shutdown after idle timeout (5min default)
  • Real streaming benchmark: 170ms first-frame on RTX 4070 Ti SUPER (paper: 74ms)

Benchmark (streaming, model warm)

Platform First audio frame
M2 Pro (CPU) 3.3s
RTX 4070 Ti SUPER (CUDA) 170ms

Closes #30

Test plan

  • cargo build --features metal passes
  • vox daemon startvox daemon statusvox daemon stop lifecycle
  • voxtream via daemon generates and plays audio
  • Idle auto-shutdown after timeout
  • Streaming benchmark on Mac and CUDA

- Add `vox daemon start/stop/status` — local HTTP server keeps models warm
- Daemon manages voxtream-server as child process (WebSocket proxy)
- Transparent routing: heavy backends auto-route through daemon if running
- Idle auto-shutdown (default 5min, configurable)
- Update README with real benchmark data (streaming: 170ms first-frame CUDA)
- Add .cargo/config.toml with Metal aliases for macOS dev builds
- Add tokio rt-multi-thread, macros, signal, time features

Benchmark results (voxtream streaming, model warm):
  M2 Pro CPU:           3.3s first audio
  RTX 4070 Ti SUPER:    170ms first audio (paper: 74ms)

Closes #30
@pszymkowiak pszymkowiak merged commit 47a7609 into main Mar 18, 2026
0 of 3 checks passed
@pszymkowiak
Copy link
Contributor Author

pszymkowiak commented Mar 18, 2026

🧞 wshm · Automated triage by AI

📊 Automated PR Analysis

Type feature
🔴 Risk high

Summary

Adds a lazy daemon mode (vox daemon start/stop/status) that keeps heavy TTS models warm in memory, manages voxtream-server (FastAPI) as a child process with WebSocket proxying, and transparently routes vox -b voxtream calls through the daemon when running. Includes auto-shutdown after an idle timeout (5min default) and achieves 170ms first-frame latency on RTX 4070 Ti SUPER.

Review Checklist

  • Tests present
  • Breaking change
  • Docs updated

Linked issues: #30


🤖 Analyzed automatically by wshm · This is an automated analysis, not a human review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: lazy daemon for warm model inference (~1-2s instead of 20-60s)

1 participant