Skip to content

feat(web): add push-to-talk, VAD continuous listening, and voice settings#303

Open
P2Chill wants to merge 2 commits intomoltis-org:mainfrom
P2Chill:feat/voice-modes
Open

feat(web): add push-to-talk, VAD continuous listening, and voice settings#303
P2Chill wants to merge 2 commits intomoltis-org:mainfrom
P2Chill:feat/voice-modes

Conversation

@P2Chill
Copy link
Copy Markdown

@P2Chill P2Chill commented Mar 3, 2026

Add two new voice input modes alongside the existing toggle:

Push-to-Talk (PTT)

  • Configurable hotkey (default F13, stored in localStorage)
  • Hold to record, release to send
  • Function keys work even when focused in text inputs
  • BroadcastChannel tab coordination prevents dual-tab recording

Voice Activity Detection (VAD)

  • Energy-based continuous listening with conversation mode button
  • Exponential sensitivity curve (0–100%) configurable in settings
  • Auto-sends after 2.5s silence, 30s max recording safety valve
  • Mutes during TTS playback, auto-resumes after with echo settle delay
  • AudioContext health monitoring with auto-resume on browser suspension
  • MediaStream track health check with automatic reacquisition
  • Race condition guards (vadTranscribing flag) prevent recorder restart storms during async transcription fetches
  • EBML header validation catches corrupt WebM blobs before API submission
  • 15s fetch timeout prevents stuck transcription state

Voice Settings UI

  • PTT key picker (click to listen, press any key to rebind)
  • VAD sensitivity slider with real-time threshold preview
  • Waveform icon button with CSS states (listening glow, speech pulse)

Also adds i18n keys for en/fr/zh locales.

Notes

  • The .github/workflows/local-ci.yml file is fork-specific CI infrastructure (sets local/* commit statuses for the upstream local-validation polling jobs, since upstream ci.yml skips checks on fork PRs). Happy to drop it if you prefer.

P2Chill added 2 commits March 3, 2026 01:42
…ings

Add two new voice input modes alongside the existing toggle:

Push-to-Talk (PTT):
- Configurable hotkey (default F13, stored in localStorage)
- Hold to record, release to send
- Function keys work even when focused in text inputs
- BroadcastChannel tab coordination prevents dual-tab recording

Voice Activity Detection (VAD):
- Energy-based continuous listening with conversation mode button
- Exponential sensitivity curve (0-100%) configurable in settings
- Auto-sends after 2.5s silence, 30s max recording safety valve
- Mutes during TTS playback, auto-resumes after with echo settle delay
- AudioContext health monitoring with auto-resume on browser suspension
- MediaStream track health check with automatic reacquisition
- Race condition guards (vadTranscribing flag) prevent recorder restart
  storms during async transcription fetches
- EBML header validation catches corrupt WebM blobs before API submission
- 15s fetch timeout prevents stuck transcription state

Voice Settings UI:
- PTT key picker (click to listen, press any key to rebind)
- VAD sensitivity slider with real-time threshold preview
- Waveform icon button with CSS states (listening glow, speech pulse)

Also adds i18n keys for en/fr/zh locales.
Sets commit statuses (local/lint, local/test, etc.) that the upstream
local-validation jobs poll for. Required because upstream ci.yml skips
actual checks on pull_request events from forks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant