Skip to content

feat(voice): add base_url config for OpenAI TTS and Whisper STT#499

Open
penso wants to merge 2 commits intomainfrom
feat/voice-base-url
Open

feat(voice): add base_url config for OpenAI TTS and Whisper STT#499
penso wants to merge 2 commits intomainfrom
feat/voice-base-url

Conversation

@penso
Copy link
Copy Markdown
Collaborator

@penso penso commented Mar 28, 2026

Summary

Cherry-picked from #331 (which contained multiple unrelated features).

Enables pointing the OpenAI-compatible TTS/STT providers at local servers like Chatterbox and faster-whisper-server without needing an API key.

  • Adds base_url field to OpenAI TTS provider config
  • Adds base_url field to Whisper STT provider config
  • Both fall back to the default OpenAI API URL when not set

Validation

Completed

  • cargo check -p moltis-voice
  • Rust fmt check

Remaining

  • ./scripts/local-validate.sh
  • Manual: configure base_url pointing to a local TTS server, verify audio generation
  • Manual: configure base_url pointing to a local Whisper server, verify transcription

Manual QA

  1. Set voice.tts.openai.base_url in config to a local Chatterbox instance
  2. Request TTS — verify it hits the local server
  3. Set voice.stt.whisper.base_url to a local faster-whisper-server
  4. Send audio — verify transcription uses local server

Supersedes the voice portion of #331.

tensiondriven and others added 2 commits March 28, 2026 10:03
…ders

Enables pointing the OpenAI-compatible TTS/STT providers at local servers
like Chatterbox and faster-whisper-server without needing an API key.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 28, 2026

Greptile Summary

This PR adds an optional base_url configuration field to the OpenAI TTS and Whisper STT providers, enabling both to point at OpenAI-compatible local servers (e.g. Chatterbox for TTS, faster-whisper-server for STT) without requiring an API key. The core implementation is clean: base_url is threaded from config schemas through to the provider structs, the auth header is conditionally omitted for keyless local servers, and is_configured() now returns true when a non-default base_url is supplied.

Key changes:

  • base_url: Option<String> added to both VoiceOpenAiConfig and VoiceWhisperConfig config structs (schema, voice config, and validate key map)
  • OpenAiTts::with_defaults and WhisperStt::with_options updated to accept and store base_url; auth header injection is now conditional
  • is_configured() extended to return true when base_url != API_BASE
  • Provider construction in gateway/src/voice.rs updated correctly for both TTS and STT

Issues found:

  • The detect_voice_providers function in methods/voice.rs was updated for OpenAI TTS but not for Whisper STT — a base_url-only Whisper config will show as unconfigured in the UI
  • The whisper_configured flag in the voice.config.get RPC response (methods/services.rs) was not updated to include base_url.is_some(), unlike the parallel openai_configured flag
  • Trailing slashes on user-supplied base_url values will produce double-slash request paths (e.g. http://host//audio/speech)
  • The config template (template.rs) documents the TTS base_url but omits the equivalent Whisper STT entry

Confidence Score: 4/5

Safe to merge after fixing two missing base_url.is_some() checks in the Whisper STT provider-detection and services config-reporting paths.

Two P1 defects exist where the same base_url-awareness added to OpenAI TTS was not consistently applied to Whisper STT in the UI/RPC reporting layer (methods/voice.rs and methods/services.rs). A user configuring only whisper.base_url gets a working transcription path but sees the provider flagged as not configured. The core provider logic and gateway wiring are correct.

crates/gateway/src/methods/voice.rs and crates/gateway/src/methods/services.rs — both are missing base_url.is_some() in the Whisper STT availability checks.

Important Files Changed

Filename Overview
crates/voice/src/tts/openai.rs Adds base_url field to OpenAiTts; correctly conditionalizes auth header injection; trailing-slash edge case on user-supplied URLs can produce double-slash request paths.
crates/voice/src/stt/whisper.rs Adds base_url to WhisperStt, replaces with_model with with_options, and conditionalizes the auth header; logic mirrors TTS changes correctly.
crates/gateway/src/methods/voice.rs TTS OpenAI availability check was updated to include base_url.is_some(), but the parallel Whisper STT block was not — causing the provider to show as unconfigured in the UI even when base_url is set.
crates/gateway/src/methods/services.rs openai_configured was correctly updated to OR with base_url.is_some(), but whisper_configured was not updated, leaving the voice.config.get RPC response inconsistent for Whisper-only-base_url configs.
crates/gateway/src/voice.rs Config forwarding and provider construction for both TTS and STT correctly propagate base_url; is_configured() gate replaces the former key-only check cleanly.
crates/config/src/template.rs Adds commented-out base_url example for TTS OpenAI only; the equivalent Whisper STT template entry is missing, reducing discoverability of the new field.
crates/config/src/schema.rs Adds base_url: Option<String> to VoiceOpenAiConfig and VoiceWhisperConfig schema structs with doc comments.
crates/config/src/validate.rs Registers base_url as a known key in the schema map for both TTS OpenAI and Whisper STT sections, preventing false validation errors.
docs/src/voice.md Adds commented-out base_url config examples to both TTS and STT documentation sections with appropriate example URLs.
crates/voice/src/config.rs Adds base_url: Option<String> fields to both OpenAiTtsConfig and WhisperConfig structs with appropriate doc-comments.

Sequence Diagram

sequenceDiagram
    participant User
    participant Gateway
    participant LiveTtsService
    participant OpenAiTts
    participant LocalTTS as Local TTS Server (e.g. Chatterbox)
    participant OpenAI as OpenAI API

    User->>Gateway: TTS request
    Gateway->>LiveTtsService: synthesize(request)
    LiveTtsService->>LiveTtsService: load_config() - resolve base_url & api_key
    LiveTtsService->>OpenAiTts: with_defaults(api_key, base_url, voice, model)
    OpenAiTts->>OpenAiTts: is_configured() - api_key.is_some() OR base_url != API_BASE

    alt base_url set (local server)
        OpenAiTts->>LocalTTS: POST {base_url}/audio/speech (no Authorization header)
        LocalTTS-->>OpenAiTts: audio bytes
    else api_key set (OpenAI)
        OpenAiTts->>OpenAI: POST https://api.openai.com/v1/audio/speech + Bearer token
        OpenAI-->>OpenAiTts: audio bytes
    end

    OpenAiTts-->>LiveTtsService: AudioOutput
    LiveTtsService-->>Gateway: AudioOutput
    Gateway-->>User: audio response
Loading

Comments Outside Diff (2)

  1. crates/gateway/src/methods/voice.rs, line 482-484 (link)

    P1 Whisper base_url not included in provider-detection availability check

    The detect_voice_providers function was updated for OpenAI TTS (line 406–408) to also treat base_url.is_some() as a signal that the provider is configured, but the parallel Whisper STT block was not updated. As a result, a user who sets only voice.stt.whisper.base_url (pointing to a local faster-whisper-server) will have a functional provider — is_configured() will return true and create_provider will succeed — but the UI will still display Whisper as not configured because this check returns false.

  2. crates/gateway/src/methods/services.rs, line 3708 (link)

    P1 whisper_configured flag ignores base_url

    The openai_configured field (line 3703) was correctly updated to include base_url.is_some(), but the analogous whisper_configured field directly below was not. A user who configures only voice.stt.whisper.base_url will get a working transcription path yet the voice.config.get RPC response will report whisper_configured: false, causing UI inconsistency.

Reviews (1): Last reviewed commit: "style(voice): apply rustfmt to cherry-pi..." | Re-trigger Greptile

Comment on lines 554 to 558
# No api_key needed for OpenAI TTS/Whisper when OpenAI is configured as an LLM provider.
# [voice.tts.openai]
# base_url = "https://api.openai.com/v1" # API endpoint (change for Chatterbox, etc.)
# voice = "alloy" # alloy, echo, fable, onyx, nova, shimmer
# model = "tts-1" # tts-1 or tts-1-hd
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Whisper STT base_url missing from config template

The template was updated to document base_url under [voice.tts.openai], but there is no corresponding template entry for the Whisper STT section. Users generating a fresh config from the template won't see the new base_url field for Whisper, making it harder to discover this feature for STT.

Consider adding a commented-out entry near the Whisper STT configuration block:

# [voice.stt.whisper]
# base_url = "https://api.openai.com/v1"  # API endpoint (change for faster-whisper-server, etc.)
# model = "whisper-1"

Comment on lines +146 to +148
.post(format!("{}/audio/speech", self.base_url))
.header("Content-Type", "application/json")
.json(&body);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Trailing slash on base_url produces double-slash URLs

format!("{}/audio/speech", self.base_url) will produce a double-slash URL (e.g. http://localhost:8003//audio/speech) if the user supplies a base_url with a trailing slash. The same applies to the Whisper STT path ({}/audio/transcriptions). While most HTTP servers tolerate this, some reverse proxies and strict OpenAI-compatible implementations do not.

Consider trimming trailing slashes when storing base_url:

base_url: base_url
    .map(|u| u.trim_end_matches('/').to_string())
    .unwrap_or_else(|| API_BASE.into()),

The same fix should be applied in WhisperStt::with_options in crates/voice/src/stt/whisper.rs.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 28, 2026

Codecov Report

❌ Patch coverage is 72.54902% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/gateway/src/voice.rs 18.75% 13 Missing ⚠️
crates/gateway/src/methods/services.rs 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 28, 2026

Merging this PR will not alter performance

✅ 39 untouched benchmarks
⏩ 5 skipped benchmarks1


Comparing feat/voice-base-url (bf50fdc) with main (efc18a9)2

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (7966ba9) during the generation of this report, so efc18a9 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants