feat(audio): add custom OpenAI-compatible TTS/ASR provider support (#357) by wyuc · Pull Request #409 · THU-MAIC/OpenMAIC

wyuc · 2026-04-12T07:02:14Z

Summary

Closes #357

Add support for custom OpenAI-compatible TTS and ASR providers, allowing users to connect self-hosted or third-party audio services that implement the OpenAI audio API.

Features

Custom provider CRUD: Add/configure/delete custom TTS and ASR providers via Settings
TTS voice management: Add/remove voices per custom provider, integrated into Agent Bar voice picker
ASR model management: Add/remove models with radio-button selection for active model
Toolbar integration: Custom ASR providers appear in the media popover with a default language list
OpenAI-compatible routing: All custom providers route through existing OpenAI-compatible TTS/ASR implementations
i18n: Full coverage across en-US, zh-CN, ja-JP, ru-RU

Code Quality (from Agent CR)

Eliminated 30+ as Record<string, unknown> casts by extending provider config type interfaces
Added AlertDialog confirmation before deleting custom providers
Added duplicate voice/model ID validation with toast error
Added dedicated i18n keys for ASR model management (modelNamePlaceholder, addModel)
Custom ASR providers without models are hidden from toolbar and show a warning in Settings

Files Changed

21 files, +1198 / -184 lines. Key areas:

lib/audio/types.ts — Extended provider ID types with custom-tts-* / custom-asr-* patterns
lib/store/settings.ts — Custom provider CRUD actions and type-safe config fields
components/settings/ — Provider creation dialog, voice/model management UI
components/generation/media-popover.tsx — Toolbar ASR provider integration
lib/audio/voice-resolver.ts — Custom provider voice resolution

Test Plan

Tested on macOS with:

Custom TTS provider: CocoR (OpenAI-compatible) — voice add/remove/preview, generation playback
Custom ASR provider: Whisper (self-hosted, OpenAI-compatible) — model add/select, recording & transcription
Toolbar ASR dropdown: custom provider appears with language list, hidden when no models configured
Agent Bar voice picker: custom TTS provider voices appear and are selectable
Settings UX: provider creation dialog, delete confirmation, duplicate ID rejection
i18n: verified zh-CN and en-US labels render correctly
Existing built-in providers: no regression in OpenAI TTS, Qwen ASR, browser-native ASR

🤖 Generated with Claude Code

…viders

…port

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add `as keyof typeof` casts where TTSProviderId/ASRProviderId (which now include template literal types) index into the built-in-only registries TTS_PROVIDERS, ASR_PROVIDERS, DEFAULT_TTS_VOICES, and DEFAULT_TTS_MODELS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The early guard `if (!provider) throw` blocked custom provider IDs from reaching the switch-case default branch where they get routed to OpenAI-compatible implementations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add "Default Model" input to custom audio provider dialog (fixes Kokoro needing model:"kokoro" instead of fallback "gpt-4o-mini-tts") - Redesign voice list as aligned table with column headers, hover reveal delete buttons, and empty state message - Add Enter key support to voice add inputs - Use consistent grid layout for add voice row Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…provider The baseUrl field in provider config is empty by default (it's an override field). Custom providers store their default URL in customDefaultBaseUrl, which needs to be used as fallback for test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When a custom provider's voice is 'default' (initial state before any voices are added), adding the first voice now automatically updates the active voice selection so test TTS works immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Scene generator, discussion TTS, and agent bar voice preview all read providerConfig.baseUrl which is empty for custom providers (the default URL is stored in customDefaultBaseUrl). Add fallback chain to all 4 call sites so custom providers work during actual playback, not just in the settings test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

getAvailableProvidersWithVoices only iterated built-in TTS_PROVIDERS, so custom providers never appeared in the agent bar voice dropdown. Now also iterates ttsProvidersConfig for custom-tts-* entries and includes their customVoices in the picker. Also updated getServerVoiceList, findVoiceDisplayName, and resolveAgentVoice to handle custom provider voice lookups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Same issue as TTS: audio-recorder, asr-settings, and audio-settings all read providerConfig.baseUrl which is empty for custom providers. Add customDefaultBaseUrl fallback to all 3 ASR call sites. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Custom ASR providers now support adding/removing models with a table UI (same design as TTS voice list). The model selector dropdown shows custom models and auto-selects the first one. Also adds custom ASR providers to the Toolbar's ASR provider dropdown and fixes custom provider handling in audio-settings.tsx (baseUrl fallback, request URL preview, API key section visibility). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- When creating custom ASR provider with a defaultModel, auto-add it to customModels so the model list isn't empty on first visit - Fix ASR settings label from "TTS Model" to "Default Model" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ASR models should be managed in provider settings (add/remove from list, select default), not pre-filled at creation time. Unlike TTS where a single default model suffices, ASR users need to manage multiple models (e.g. whisper-small, whisper-large-v3). - Remove Default Model field from ASR creation dialog (keep for TTS) - Remove defaultModel parameter from addCustomASRProvider - ASR models are added via the model list UI in provider settings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the separate model selector dropdown for custom ASR providers. The first model in the list is always used as default. Users manage the model list only (add/remove), no separate "default" selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Show spinner with "Processing..." while waiting for transcription - Disable record button during processing - Show specific error messages from API (e.g. "No transcription generated") instead of generic failure text - Handle empty transcription result as an error with helpful message Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add proper type declarations for custom provider fields (customName, customDefaultBaseUrl, customVoices, etc.) eliminating 30+ unsafe casts - Show custom ASR providers in toolbar with default language list - Hide custom ASR providers without models from toolbar selection - Add no-models warning in ASR settings panel - Add delete confirmation dialog for custom TTS/ASR providers - Add duplicate voice/model ID validation with toast error - Add dedicated i18n keys for ASR model management (modelNamePlaceholder, addModel) - Fix i18n key for ASR processing label (settings.asrProcessing) - Add radio-button model selection in custom ASR provider settings - Add visible OpenAI-compatible description in provider creation dialog - Add CUSTOM_ASR_DEFAULT_LANGUAGES constant for toolbar language options Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

components/generation/media-popover.tsx

… mock The settings-server-sync test mocked @/lib/audio/types as an empty object, but settings.ts now imports isCustomASRProvider for provider validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When a custom provider is created, the actual URL is stored in customDefaultBaseUrl while baseUrl is initialized empty. Several call sites only checked baseUrl, causing fetch failures during course generation. Add fallback to customDefaultBaseUrl in: - generation-preview/page.tsx (course generation TTS) - tts-providers.ts (getCurrentTTSConfig) - asr-providers.ts (getCurrentASRConfig) - media-popover.tsx (TTS preview) - tts-config-popover.tsx (TTS preview) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cosarah

Verified custom TTS/ASR provider functionality across multiple scenarios:

Custom provider creation with default base URL
Voice preview and TTS generation with custom providers
Base URL fallback from customDefaultBaseUrl when panel base URL is not explicitly set
Built-in provider regression (OpenAI TTS unchanged)

The customDefaultBaseUrl fallback is now correctly applied in all generation paths including generation-preview/page.tsx. LGTM.

Flagged by CodeQL code quality check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cosarah

LGTM
verified locally

wyuc and others added 22 commits April 12, 2026 12:27

feat(audio): extend TTSProviderId/ASRProviderId to support custom pro…

687840d

…viders

feat(audio): update provider registry helpers for custom provider sup…

2626785

…port

feat(audio): route custom providers to OpenAI-compatible implementations

3c0b4e7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(audio): add custom TTS/ASR provider CRUD to settings store

46a0cf4

feat(audio): add dialog for creating custom audio providers

91e94b4

feat(audio): add custom provider UI with voice management and delete

99caa0e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(i18n): add custom audio provider translation strings

70db8ff

fix(i18n): use correct key for processing label in ASR test button

cb5e5a0

github-code-quality bot found potential problems Apr 12, 2026

View reviewed changes

components/generation/media-popover.tsx Fixed Show fixed Hide fixed

wyuc marked this pull request as ready for review April 12, 2026 07:08

wyuc and others added 2 commits April 12, 2026 21:19

style: format generation-preview/page.tsx

773a180

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cosarah previously approved these changes Apr 12, 2026

View reviewed changes

fix: remove unused ttsSpeedRange variable in media-popover

403952f

Flagged by CodeQL code quality check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wyuc dismissed cosarah’s stale review via 403952f April 12, 2026 13:46

cosarah approved these changes Apr 12, 2026

View reviewed changes

cosarah merged commit 9a0060e into main Apr 12, 2026
3 checks passed

wyuc deleted the worktree-357-custom-tts-asr-provider branch April 12, 2026 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audio): add custom OpenAI-compatible TTS/ASR provider support (#357)#409

feat(audio): add custom OpenAI-compatible TTS/ASR provider support (#357)#409
cosarah merged 26 commits intomainfrom
worktree-357-custom-tts-asr-provider

wyuc commented Apr 12, 2026

Uh oh!

Uh oh!

cosarah left a comment

Uh oh!

cosarah left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wyuc commented Apr 12, 2026

Summary

Features

Code Quality (from Agent CR)

Files Changed

Test Plan

Uh oh!

Uh oh!

cosarah left a comment

Choose a reason for hiding this comment

Uh oh!

cosarah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants