Skip to content

fix(VoiceServer): cross-platform audio playback in playAudio()#1061

Open
MHoroszowski wants to merge 1 commit intodanielmiessler:mainfrom
MHoroszowski:fix/voice-server-cross-platform-audio
Open

fix(VoiceServer): cross-platform audio playback in playAudio()#1061
MHoroszowski wants to merge 1 commit intodanielmiessler:mainfrom
MHoroszowski:fix/voice-server-cross-platform-audio

Conversation

@MHoroszowski
Copy link
Copy Markdown

Summary

VoiceServer/server.ts:playAudio() hardcodes /usr/bin/afplay, which is macOS-only. On Linux every TTS notification fails with ENOENT and the voice server appears to work but produces no audio — the failure is swallowed by the fire-and-forget curl pattern at the call sites, so users see ✅ success in their terminal and silent speakers.

This PR makes audio playback cross-platform. It is complementary to #1030 (which covers the desktop-notification half via notify-send) and addresses the audio-playback half of #855 — neither file region overlaps.

Change

Extract player resolution into a small getAudioPlayer() helper, then call it from playAudio():

Platform Player Notes
darwin /usr/bin/afplay unchanged behavior
linux + ffplay present /usr/bin/ffplay preferred — ffmpeg is widely preinstalled
linux + mpg123 present /usr/bin/mpg123 lightweight fallback (~500 KB)
neither throws with install hint actionable error instead of ENOENT

Volume is preserved across players: afplay -v (0..1 float), ffplay -volume (0..100 int), mpg123 -f (0..32768 PCM scale).

Why ffplay first, mpg123 second

ffplay ships with ffmpeg which is already a dependency on most modern dev boxes; mpg123 is the well-known minimal fallback called out in #855. Trying both gives users a graceful path on minimal containers/distros without forcing a heavy install.

Test plan

  • Verified on Ubuntu 24.04 / WSL2 on Windows 11 — TTS audio plays through WSLg PulseAudio to Windows speakers with zero additional configuration (no PulseAudio TCP forwarding, no Docker shenanigans)
  • mpg123 fallback path verified directly (mpg123 -q -f 32768 /tmp/voice-*.mp3 → exit 0, audio plays)
  • macOS code path unchanged — same afplay -v invocation
  • No new dependencies; no changes outside playAudio() and the new helper
  • macOS smoke test (would appreciate a maintainer running it before merge — I don't have a Mac handy)

Scope

References

playAudio() hardcoded /usr/bin/afplay, which is macOS-only. On Linux,
every TTS notification fails with ENOENT and the voice server appears
to work but produces no audio (the failure is swallowed by the
fire-and-forget curl pattern used at the call sites).

Extract player resolution into getAudioPlayer():
- darwin           → afplay  (unchanged)
- linux + ffplay   → ffplay -nodisp -autoexit -volume 0..100
- linux + mpg123   → mpg123 -f 0..32768 (PCM scale)
- neither          → throw with an actionable install hint

ffplay is preferred because ffmpeg is widely preinstalled; mpg123 is
the lightweight fallback. Both route through PulseAudio, so this works
on native Linux and on Windows via WSL2 + WSLg out of the box.

Verified on Ubuntu 24.04 / WSL2 (Windows 11): TTS audio plays through
WSLg PulseAudio to Windows speakers with no additional configuration.

Addresses the audio-playback half of danielmiessler#855. Complementary to danielmiessler#1030,
which covers the desktop-notification half (osascript → notify-send)
without overlap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant