Skip to content

feat(audio): add custom OpenAI-compatible TTS/ASR provider support (#357)#409

Merged
cosarah merged 26 commits intomainfrom
worktree-357-custom-tts-asr-provider
Apr 12, 2026
Merged

feat(audio): add custom OpenAI-compatible TTS/ASR provider support (#357)#409
cosarah merged 26 commits intomainfrom
worktree-357-custom-tts-asr-provider

Conversation

@wyuc
Copy link
Copy Markdown
Contributor

@wyuc wyuc commented Apr 12, 2026

Summary

Closes #357

Add support for custom OpenAI-compatible TTS and ASR providers, allowing users to connect self-hosted or third-party audio services that implement the OpenAI audio API.

Features

  • Custom provider CRUD: Add/configure/delete custom TTS and ASR providers via Settings
  • TTS voice management: Add/remove voices per custom provider, integrated into Agent Bar voice picker
  • ASR model management: Add/remove models with radio-button selection for active model
  • Toolbar integration: Custom ASR providers appear in the media popover with a default language list
  • OpenAI-compatible routing: All custom providers route through existing OpenAI-compatible TTS/ASR implementations
  • i18n: Full coverage across en-US, zh-CN, ja-JP, ru-RU

Code Quality (from Agent CR)

  • Eliminated 30+ as Record<string, unknown> casts by extending provider config type interfaces
  • Added AlertDialog confirmation before deleting custom providers
  • Added duplicate voice/model ID validation with toast error
  • Added dedicated i18n keys for ASR model management (modelNamePlaceholder, addModel)
  • Custom ASR providers without models are hidden from toolbar and show a warning in Settings

Files Changed

21 files, +1198 / -184 lines. Key areas:

  • lib/audio/types.ts — Extended provider ID types with custom-tts-* / custom-asr-* patterns
  • lib/store/settings.ts — Custom provider CRUD actions and type-safe config fields
  • components/settings/ — Provider creation dialog, voice/model management UI
  • components/generation/media-popover.tsx — Toolbar ASR provider integration
  • lib/audio/voice-resolver.ts — Custom provider voice resolution

Test Plan

Tested on macOS with:

  • Custom TTS provider: CocoR (OpenAI-compatible) — voice add/remove/preview, generation playback
  • Custom ASR provider: Whisper (self-hosted, OpenAI-compatible) — model add/select, recording & transcription
  • Toolbar ASR dropdown: custom provider appears with language list, hidden when no models configured
  • Agent Bar voice picker: custom TTS provider voices appear and are selectable
  • Settings UX: provider creation dialog, delete confirmation, duplicate ID rejection
  • i18n: verified zh-CN and en-US labels render correctly
  • Existing built-in providers: no regression in OpenAI TTS, Qwen ASR, browser-native ASR

🤖 Generated with Claude Code

wyuc and others added 22 commits April 12, 2026 12:27
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `as keyof typeof` casts where TTSProviderId/ASRProviderId (which now
include template literal types) index into the built-in-only registries
TTS_PROVIDERS, ASR_PROVIDERS, DEFAULT_TTS_VOICES, and DEFAULT_TTS_MODELS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The early guard `if (!provider) throw` blocked custom provider IDs from
reaching the switch-case default branch where they get routed to
OpenAI-compatible implementations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add "Default Model" input to custom audio provider dialog (fixes
  Kokoro needing model:"kokoro" instead of fallback "gpt-4o-mini-tts")
- Redesign voice list as aligned table with column headers, hover
  reveal delete buttons, and empty state message
- Add Enter key support to voice add inputs
- Use consistent grid layout for add voice row

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…provider

The baseUrl field in provider config is empty by default (it's an
override field). Custom providers store their default URL in
customDefaultBaseUrl, which needs to be used as fallback for test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a custom provider's voice is 'default' (initial state before any
voices are added), adding the first voice now automatically updates the
active voice selection so test TTS works immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scene generator, discussion TTS, and agent bar voice preview all read
providerConfig.baseUrl which is empty for custom providers (the default
URL is stored in customDefaultBaseUrl). Add fallback chain to all 4
call sites so custom providers work during actual playback, not just
in the settings test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
getAvailableProvidersWithVoices only iterated built-in TTS_PROVIDERS,
so custom providers never appeared in the agent bar voice dropdown.
Now also iterates ttsProvidersConfig for custom-tts-* entries and
includes their customVoices in the picker.

Also updated getServerVoiceList, findVoiceDisplayName, and
resolveAgentVoice to handle custom provider voice lookups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same issue as TTS: audio-recorder, asr-settings, and audio-settings
all read providerConfig.baseUrl which is empty for custom providers.
Add customDefaultBaseUrl fallback to all 3 ASR call sites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Custom ASR providers now support adding/removing models with a table
UI (same design as TTS voice list). The model selector dropdown shows
custom models and auto-selects the first one.

Also adds custom ASR providers to the Toolbar's ASR provider dropdown
and fixes custom provider handling in audio-settings.tsx (baseUrl
fallback, request URL preview, API key section visibility).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- When creating custom ASR provider with a defaultModel, auto-add it
  to customModels so the model list isn't empty on first visit
- Fix ASR settings label from "TTS Model" to "Default Model"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ASR models should be managed in provider settings (add/remove from
list, select default), not pre-filled at creation time. Unlike TTS
where a single default model suffices, ASR users need to manage
multiple models (e.g. whisper-small, whisper-large-v3).

- Remove Default Model field from ASR creation dialog (keep for TTS)
- Remove defaultModel parameter from addCustomASRProvider
- ASR models are added via the model list UI in provider settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the separate model selector dropdown for custom ASR providers.
The first model in the list is always used as default. Users manage
the model list only (add/remove), no separate "default" selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Show spinner with "Processing..." while waiting for transcription
- Disable record button during processing
- Show specific error messages from API (e.g. "No transcription
  generated") instead of generic failure text
- Handle empty transcription result as an error with helpful message

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add proper type declarations for custom provider fields (customName,
  customDefaultBaseUrl, customVoices, etc.) eliminating 30+ unsafe casts
- Show custom ASR providers in toolbar with default language list
- Hide custom ASR providers without models from toolbar selection
- Add no-models warning in ASR settings panel
- Add delete confirmation dialog for custom TTS/ASR providers
- Add duplicate voice/model ID validation with toast error
- Add dedicated i18n keys for ASR model management (modelNamePlaceholder, addModel)
- Fix i18n key for ASR processing label (settings.asrProcessing)
- Add radio-button model selection in custom ASR provider settings
- Add visible OpenAI-compatible description in provider creation dialog
- Add CUSTOM_ASR_DEFAULT_LANGUAGES constant for toolbar language options

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… mock

The settings-server-sync test mocked @/lib/audio/types as an empty object,
but settings.ts now imports isCustomASRProvider for provider validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wyuc wyuc marked this pull request as ready for review April 12, 2026 07:08
wyuc and others added 2 commits April 12, 2026 21:19
When a custom provider is created, the actual URL is stored in
customDefaultBaseUrl while baseUrl is initialized empty. Several call
sites only checked baseUrl, causing fetch failures during course
generation. Add fallback to customDefaultBaseUrl in:
- generation-preview/page.tsx (course generation TTS)
- tts-providers.ts (getCurrentTTSConfig)
- asr-providers.ts (getCurrentASRConfig)
- media-popover.tsx (TTS preview)
- tts-config-popover.tsx (TTS preview)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cosarah
cosarah previously approved these changes Apr 12, 2026
Copy link
Copy Markdown
Collaborator

@cosarah cosarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified custom TTS/ASR provider functionality across multiple scenarios:

  • Custom provider creation with default base URL
  • Voice preview and TTS generation with custom providers
  • Base URL fallback from customDefaultBaseUrl when panel base URL is not explicitly set
  • Built-in provider regression (OpenAI TTS unchanged)

The customDefaultBaseUrl fallback is now correctly applied in all generation paths including generation-preview/page.tsx. LGTM.

Flagged by CodeQL code quality check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@cosarah cosarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
verified locally

@cosarah cosarah merged commit 9a0060e into main Apr 12, 2026
3 checks passed
@wyuc wyuc deleted the worktree-357-custom-tts-asr-provider branch April 12, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve local/self-hosted model deployment experience

2 participants