feat(voice): add base_url config for OpenAI TTS and Whisper STT#499
feat(voice): add base_url config for OpenAI TTS and Whisper STT#499
Conversation
…ders Enables pointing the OpenAI-compatible TTS/STT providers at local servers like Chatterbox and faster-whisper-server without needing an API key.
Greptile SummaryThis PR adds an optional Key changes:
Issues found:
Confidence Score: 4/5Safe to merge after fixing two missing base_url.is_some() checks in the Whisper STT provider-detection and services config-reporting paths. Two P1 defects exist where the same base_url-awareness added to OpenAI TTS was not consistently applied to Whisper STT in the UI/RPC reporting layer (methods/voice.rs and methods/services.rs). A user configuring only whisper.base_url gets a working transcription path but sees the provider flagged as not configured. The core provider logic and gateway wiring are correct. crates/gateway/src/methods/voice.rs and crates/gateway/src/methods/services.rs — both are missing base_url.is_some() in the Whisper STT availability checks. Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Gateway
participant LiveTtsService
participant OpenAiTts
participant LocalTTS as Local TTS Server (e.g. Chatterbox)
participant OpenAI as OpenAI API
User->>Gateway: TTS request
Gateway->>LiveTtsService: synthesize(request)
LiveTtsService->>LiveTtsService: load_config() - resolve base_url & api_key
LiveTtsService->>OpenAiTts: with_defaults(api_key, base_url, voice, model)
OpenAiTts->>OpenAiTts: is_configured() - api_key.is_some() OR base_url != API_BASE
alt base_url set (local server)
OpenAiTts->>LocalTTS: POST {base_url}/audio/speech (no Authorization header)
LocalTTS-->>OpenAiTts: audio bytes
else api_key set (OpenAI)
OpenAiTts->>OpenAI: POST https://api.openai.com/v1/audio/speech + Bearer token
OpenAI-->>OpenAiTts: audio bytes
end
OpenAiTts-->>LiveTtsService: AudioOutput
LiveTtsService-->>Gateway: AudioOutput
Gateway-->>User: audio response
|
| # No api_key needed for OpenAI TTS/Whisper when OpenAI is configured as an LLM provider. | ||
| # [voice.tts.openai] | ||
| # base_url = "https://api.openai.com/v1" # API endpoint (change for Chatterbox, etc.) | ||
| # voice = "alloy" # alloy, echo, fable, onyx, nova, shimmer | ||
| # model = "tts-1" # tts-1 or tts-1-hd |
There was a problem hiding this comment.
Whisper STT
base_url missing from config template
The template was updated to document base_url under [voice.tts.openai], but there is no corresponding template entry for the Whisper STT section. Users generating a fresh config from the template won't see the new base_url field for Whisper, making it harder to discover this feature for STT.
Consider adding a commented-out entry near the Whisper STT configuration block:
# [voice.stt.whisper]
# base_url = "https://api.openai.com/v1" # API endpoint (change for faster-whisper-server, etc.)
# model = "whisper-1"| .post(format!("{}/audio/speech", self.base_url)) | ||
| .header("Content-Type", "application/json") | ||
| .json(&body); |
There was a problem hiding this comment.
Trailing slash on
base_url produces double-slash URLs
format!("{}/audio/speech", self.base_url) will produce a double-slash URL (e.g. http://localhost:8003//audio/speech) if the user supplies a base_url with a trailing slash. The same applies to the Whisper STT path ({}/audio/transcriptions). While most HTTP servers tolerate this, some reverse proxies and strict OpenAI-compatible implementations do not.
Consider trimming trailing slashes when storing base_url:
base_url: base_url
.map(|u| u.trim_end_matches('/').to_string())
.unwrap_or_else(|| API_BASE.into()),The same fix should be applied in WhisperStt::with_options in crates/voice/src/stt/whisper.rs.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Merging this PR will not alter performance
Comparing Footnotes
|
Summary
Cherry-picked from #331 (which contained multiple unrelated features).
Enables pointing the OpenAI-compatible TTS/STT providers at local servers like Chatterbox and faster-whisper-server without needing an API key.
base_urlfield to OpenAI TTS provider configbase_urlfield to Whisper STT provider configValidation
Completed
cargo check -p moltis-voiceRemaining
./scripts/local-validate.shManual QA
voice.tts.openai.base_urlin config to a local Chatterbox instancevoice.stt.whisper.base_urlto a local faster-whisper-serverSupersedes the voice portion of #331.