feat(audio): add custom OpenAI-compatible TTS/ASR provider support (#357)#409
Merged
feat(audio): add custom OpenAI-compatible TTS/ASR provider support (#357)#409
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `as keyof typeof` casts where TTSProviderId/ASRProviderId (which now include template literal types) index into the built-in-only registries TTS_PROVIDERS, ASR_PROVIDERS, DEFAULT_TTS_VOICES, and DEFAULT_TTS_MODELS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The early guard `if (!provider) throw` blocked custom provider IDs from reaching the switch-case default branch where they get routed to OpenAI-compatible implementations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add "Default Model" input to custom audio provider dialog (fixes Kokoro needing model:"kokoro" instead of fallback "gpt-4o-mini-tts") - Redesign voice list as aligned table with column headers, hover reveal delete buttons, and empty state message - Add Enter key support to voice add inputs - Use consistent grid layout for add voice row Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…provider The baseUrl field in provider config is empty by default (it's an override field). Custom providers store their default URL in customDefaultBaseUrl, which needs to be used as fallback for test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a custom provider's voice is 'default' (initial state before any voices are added), adding the first voice now automatically updates the active voice selection so test TTS works immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scene generator, discussion TTS, and agent bar voice preview all read providerConfig.baseUrl which is empty for custom providers (the default URL is stored in customDefaultBaseUrl). Add fallback chain to all 4 call sites so custom providers work during actual playback, not just in the settings test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
getAvailableProvidersWithVoices only iterated built-in TTS_PROVIDERS, so custom providers never appeared in the agent bar voice dropdown. Now also iterates ttsProvidersConfig for custom-tts-* entries and includes their customVoices in the picker. Also updated getServerVoiceList, findVoiceDisplayName, and resolveAgentVoice to handle custom provider voice lookups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same issue as TTS: audio-recorder, asr-settings, and audio-settings all read providerConfig.baseUrl which is empty for custom providers. Add customDefaultBaseUrl fallback to all 3 ASR call sites. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Custom ASR providers now support adding/removing models with a table UI (same design as TTS voice list). The model selector dropdown shows custom models and auto-selects the first one. Also adds custom ASR providers to the Toolbar's ASR provider dropdown and fixes custom provider handling in audio-settings.tsx (baseUrl fallback, request URL preview, API key section visibility). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- When creating custom ASR provider with a defaultModel, auto-add it to customModels so the model list isn't empty on first visit - Fix ASR settings label from "TTS Model" to "Default Model" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ASR models should be managed in provider settings (add/remove from list, select default), not pre-filled at creation time. Unlike TTS where a single default model suffices, ASR users need to manage multiple models (e.g. whisper-small, whisper-large-v3). - Remove Default Model field from ASR creation dialog (keep for TTS) - Remove defaultModel parameter from addCustomASRProvider - ASR models are added via the model list UI in provider settings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the separate model selector dropdown for custom ASR providers. The first model in the list is always used as default. Users manage the model list only (add/remove), no separate "default" selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Show spinner with "Processing..." while waiting for transcription - Disable record button during processing - Show specific error messages from API (e.g. "No transcription generated") instead of generic failure text - Handle empty transcription result as an error with helpful message Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add proper type declarations for custom provider fields (customName, customDefaultBaseUrl, customVoices, etc.) eliminating 30+ unsafe casts - Show custom ASR providers in toolbar with default language list - Hide custom ASR providers without models from toolbar selection - Add no-models warning in ASR settings panel - Add delete confirmation dialog for custom TTS/ASR providers - Add duplicate voice/model ID validation with toast error - Add dedicated i18n keys for ASR model management (modelNamePlaceholder, addModel) - Fix i18n key for ASR processing label (settings.asrProcessing) - Add radio-button model selection in custom ASR provider settings - Add visible OpenAI-compatible description in provider creation dialog - Add CUSTOM_ASR_DEFAULT_LANGUAGES constant for toolbar language options Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… mock The settings-server-sync test mocked @/lib/audio/types as an empty object, but settings.ts now imports isCustomASRProvider for provider validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a custom provider is created, the actual URL is stored in customDefaultBaseUrl while baseUrl is initialized empty. Several call sites only checked baseUrl, causing fetch failures during course generation. Add fallback to customDefaultBaseUrl in: - generation-preview/page.tsx (course generation TTS) - tts-providers.ts (getCurrentTTSConfig) - asr-providers.ts (getCurrentASRConfig) - media-popover.tsx (TTS preview) - tts-config-popover.tsx (TTS preview) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cosarah
previously approved these changes
Apr 12, 2026
Collaborator
cosarah
left a comment
There was a problem hiding this comment.
Verified custom TTS/ASR provider functionality across multiple scenarios:
- Custom provider creation with default base URL
- Voice preview and TTS generation with custom providers
- Base URL fallback from
customDefaultBaseUrlwhen panel base URL is not explicitly set - Built-in provider regression (OpenAI TTS unchanged)
The customDefaultBaseUrl fallback is now correctly applied in all generation paths including generation-preview/page.tsx. LGTM.
Flagged by CodeQL code quality check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #357
Add support for custom OpenAI-compatible TTS and ASR providers, allowing users to connect self-hosted or third-party audio services that implement the OpenAI audio API.
Features
Code Quality (from Agent CR)
as Record<string, unknown>casts by extending provider config type interfacesAlertDialogconfirmation before deleting custom providersmodelNamePlaceholder,addModel)Files Changed
21 files, +1198 / -184 lines. Key areas:
lib/audio/types.ts— Extended provider ID types withcustom-tts-*/custom-asr-*patternslib/store/settings.ts— Custom provider CRUD actions and type-safe config fieldscomponents/settings/— Provider creation dialog, voice/model management UIcomponents/generation/media-popover.tsx— Toolbar ASR provider integrationlib/audio/voice-resolver.ts— Custom provider voice resolutionTest Plan
Tested on macOS with:
🤖 Generated with Claude Code