Global hold-to-dictate for macOS.
Hold the spacebar anywhere on the system, speak, release, and spoke pastes the transcription at the current cursor. It runs as a menubar app with PyObjC, supports three transcription backends, and optionally routes voice commands to a local LLM:
- Local MLX Whisper
- Local Qwen3-ASR via MLX
- Remote OpenAI-compatible
/v1/audio/transcriptionssidecar
Screen.Recording.2026-03-25.at.3.57.48.AM-demo.mp4
Hold spacebar → record mic audio → preview text while speaking → transcribe on release → paste at cursor
Quick taps still produce a normal space. Longer holds trigger recording, show the glow/overlay UI, then inject the final text with pasteboard save/restore plus synthetic Cmd+V. If the paste doesn't land (wrong app focused, no text field, etc.), a recovery overlay appears with retry and dismiss options.
- Global spacebar hold detection with normal tap passthrough
- Live preview overlay during recording
- Screen-edge glow driven by microphone amplitude
- Local transcription by default (Whisper or Qwen3-ASR) when
SPOKE_WHISPER_URLis unset - Optional spoken-command pathway with local tool calls and overlay output
- Optional remote sidecar mode for heavier models
- Voice command pathway via Shift+Space — sends utterances to a local LLM with streaming response overlay
- OCR-verified paste with automatic recovery overlay on failure
- Decoder-loop and silence-hallucination deduplication
- Bounded Whisper ontology-vocabulary repair for recurring Epistaxis terms
- Single-instance app behavior
- Menubar-only UI with no Dock icon
- macOS 11+
- Python 3.13+
- uv
portaudio
Install the system audio dependency:
brew install portaudiogit clone https://github.com/lyonsno/spoke.git
cd spoke
uv syncIf you do not set SPOKE_WHISPER_URL, spoke runs transcription locally with mlx-whisper.
By default, preview uses Whisper medium.en while final transcription uses Whisper large-v3-turbo
on machines that pass the existing RAM guard; otherwise both roles fall back to the lighter model.
uv run spokeUse a Qwen model name to switch the local backend:
SPOKE_WHISPER_MODEL=Qwen/Qwen3-ASR-0.6B uv run spokePoint spoke at any OpenAI-compatible transcription server:
SPOKE_WHISPER_URL=http://<host>:8000 uv run spokeExample sidecar on Apple Silicon (TTS + STT):
./scripts/setup-mlx-audio-server.sh --start --port 9001This installs mlx-audio as a uv tool with all runtime deps patched (the
published extras are missing several transitive dependencies — see the script
for details). Once running, load models dynamically:
curl -X POST "http://localhost:9001/v1/models?model_name=mlx-community/Voxtral-4B-TTS-2603-mlx-6bit"
curl -X POST "http://localhost:9001/v1/models?model_name=mlx-community/Kokoro-82M-bf16"Or start the server manually after install:
mlx_audio.server --host 0.0.0.0 --port 9001 --workers 1Note: The binary is
mlx_audio.server(dots), notmlx-audio-server(dashes).
When SPOKE_COMMAND_URL is set, Shift+Space activates the command pathway instead of dictation. Spoken input is sent to a local LLM (OpenAI-compatible chat completions API) and the streamed response appears in a dedicated overlay.
SPOKE_COMMAND_URL=http://localhost:8001 uv run spokeSpoke applies a bounded post-transcription repair pass for recurring Epistaxis ontology terms that have already shown up incorrectly in real launch logs. The repair pass now normalizes those hits to accented canonical forms, and the visible overlay/tray tints those ontology words in the same glow-blue family as the rest of the UI. Current observed failure examples include:
Epistaxes,Nepistaxis,Epistexis,in his taxes->EpístaxisEpistaxistopos->Epístaxis tóposTopoie,topoit,tipos->tópoi,tóposan Afro,Afra,Aphro->anaphoráMetadose,Metadose II->metádosisUxis,of seizes,Oxygesis,Oxesis,auxesus->aúxesisSyllogy,silagee,sueji,Silegy->syllogéappless kept says,upper skepticism,Aposcepsis,Episcapsis->aposképsiskerigma,kergma,Curigma,Karigma,Charygma,chorigma->kérygmaepinorthosis,Epin orthosis,Evanorthosis->epanórthosisepispokisis,epispokosis->epispókisissemi-hostess,semi-oce's,Semion,Semian->sēmeiōsis,sēmeionProbolia,Proboli,probly,probaly,probally->proboléAutopuise,Autopoises,Otopoiesis->autopoíesisooxisis->aúxesisCatastasis->katástasisLysis->lýsis
Whenever one of these repairs fires, the launch logs keep both the raw and repaired text so the vocabulary list can expand from observed failures instead of invented cases.
On first run, macOS will ask for:
- Microphone access
- Accessibility access
Accessibility must be granted to the app that launches spoke if you run it from a terminal, or to Spoke.app if you run the bundled app.
The env var names use WHISPER for historical reasons — they control all backends, not just Whisper.
Preview and final transcription can now be configured independently, and the menu persists those
choices across relaunches.
| Variable | Default | Description |
|---|---|---|
SPOKE_WHISPER_URL |
unset | Remote transcription server. When unset, transcription runs locally. |
SPOKE_WHISPER_MODEL |
unset | Legacy single-model override. When set, both preview and final use the same model. |
SPOKE_PREVIEW_MODEL |
mlx-community/whisper-medium.en-mlx-8bit |
Preview model identifier. Use Qwen/Qwen3-ASR-0.6B for local streaming preview, or any menu-listed Whisper variant. |
SPOKE_TRANSCRIPTION_MODEL |
mlx-community/whisper-large-v3-turbo |
Final transcription model identifier. Use Qwen/Qwen3-ASR-0.6B or any menu-listed Whisper variant. |
SPOKE_COMMAND_URL |
unset | OpenAI-compatible OMLX chat endpoint used by the assistant command pathway. |
SPOKE_COMMAND_MODEL |
qwen3p5-35B-A3B |
Initial assistant model identifier. When the command pathway is enabled, the menu bar persists the selected assistant model across relaunches. |
SPOKE_COMMAND_MODEL_DIR |
~/.lmstudio/models |
Optional local model inventory scanned to seed extra Assistant menu entries in org/model form alongside the server-reported /v1/models list. |
SPOKE_HOLD_MS |
200 |
Spacebar hold threshold in milliseconds. Must be greater than 0. |
SPOKE_RESTORE_DELAY_MS |
1000 |
Delay before the original pasteboard contents are restored. |
SPOKE_COMMAND_URL |
unset | Local LLM server for voice commands (Shift+Space). Chat completions endpoint. |
SPOKE_COMMAND_MODEL |
qwen3p5-35B-A3B |
Model name sent in command requests. |
| Variable | Default | Description |
|---|---|---|
SPOKE_COMMAND_URL |
unset | Enables the spoken-command pathway when set to an OpenAI-compatible chat completions endpoint. |
SPOKE_COMMAND_MODEL |
qwen3p5-35B-A3B |
Model id used for the spoken-command pathway. |
SPOKE_COMMAND_API_KEY |
unset | Optional bearer token for the command endpoint. |
SPOKE_GMAIL_CREDENTIALS_PATH |
~/Library/Application Support/Spoke/gmail_credentials.json |
Local JSON file containing Gmail OAuth material for the bounded read-only query_gmail tool. |
SPOKE_GMAIL_CLIENT_ID |
unset | Optional env override for the Gmail OAuth client id. |
SPOKE_GMAIL_CLIENT_SECRET |
unset | Optional env override for the Gmail OAuth client secret. |
SPOKE_GMAIL_REFRESH_TOKEN |
unset | Optional env override for the Gmail OAuth refresh token. |
SPOKE_GMAIL_TOKEN_URI |
https://oauth2.googleapis.com/token |
Optional token endpoint override for Gmail OAuth refresh requests. |
The first Gmail affordance is intentionally narrow and read-only: query_gmail
currently supports recent starred recruiter- or CTO-style mail and returns
compact metadata plus snippets rather than full message bodies.
The overlay and glow also expose advanced tuning env vars such as SPOKE_GLOW_MULTIPLIER, SPOKE_TEXT_ALPHA_MIN, and related SPOKE_* values in the overlay/glow modules.
Run the test suite:
uv run pytest -vEach layer is independent and testable in isolation.
spoke/
├── __main__.py # app delegate and runtime wiring
├── input_tap.py # global spacebar hold detection
├── capture.py # sounddevice recording and WAV encoding
├── transcribe.py # remote OpenAI-compatible client
├── transcribe_local.py # local MLX Whisper backend
├── transcribe_qwen.py # local Qwen3-ASR backend
├── patch_qwen3_streaming.py # upstream Qwen3-ASR overlap fix
├── dedup.py # decoder-loop and hallucination cleanup
├── inject.py # pasteboard save/paste/restore
├── paste_verify.py # post-paste OCR verification
├── focus_check.py # text-field focus detection via Accessibility
├── command.py # voice command dispatch to local LLM
├── command_overlay.py # streaming command response overlay
├── glow.py # screen-edge amplitude glow
├── overlay.py # live transcription overlay
└── menubar.py # status item and menu
Build the macOS app bundle with PyInstaller:
./scripts/build.shFast incremental rebuild:
./scripts/build.sh --fastCreate a DMG after building the app:
brew install create-dmg
./scripts/build-dmg.shThe app bundle is written to dist/Spoke.app.
- The bundled app logs to
~/Library/Logs/Spoke.log. - The local MLX backends may download model weights on first use.
- The app is designed for Apple Silicon-oriented local inference workflows, but remote sidecar mode works independently of local model availability.
MIT