spoke

Global hold-to-dictate for macOS.

Hold the spacebar anywhere on the system, speak, release, and spoke pastes the transcription at the current cursor. It runs as a menubar app with PyObjC, supports three transcription backends, and optionally routes voice commands to a local LLM:

Local MLX Whisper
Local Qwen3-ASR via MLX
Remote OpenAI-compatible /v1/audio/transcriptions sidecar

Screen.Recording.2026-03-25.at.3.57.48.AM-demo.mp4

How It Works

Hold spacebar → record mic audio → preview text while speaking → transcribe on release → paste at cursor

Quick taps still produce a normal space. Longer holds trigger recording, show the glow/overlay UI, then inject the final text with pasteboard save/restore plus synthetic Cmd+V. If the paste doesn't land (wrong app focused, no text field, etc.), a recovery overlay appears with retry and dismiss options.

Features

Global spacebar hold detection with normal tap passthrough
Live preview overlay during recording
Screen-edge glow driven by microphone amplitude
Local transcription by default (Whisper or Qwen3-ASR) when SPOKE_WHISPER_URL is unset
Optional spoken-command pathway with local tool calls and overlay output
Optional remote sidecar mode for heavier models
Voice command pathway via Shift+Space — sends utterances to a local LLM with streaming response overlay
OCR-verified paste with automatic recovery overlay on failure
Decoder-loop and silence-hallucination deduplication
Bounded Whisper ontology-vocabulary repair for recurring Epistaxis terms
Single-instance app behavior
Menubar-only UI with no Dock icon

Requirements

macOS 11+
Python 3.13+
uv
portaudio

Install the system audio dependency:

brew install portaudio

Install

git clone https://github.com/lyonsno/spoke.git
cd spoke
uv sync

Run

Default: local MLX Whisper

If you do not set SPOKE_WHISPER_URL, spoke runs transcription locally with mlx-whisper. By default, preview uses Whisper medium.en while final transcription uses Whisper large-v3-turbo on machines that pass the existing RAM guard; otherwise both roles fall back to the lighter model.

uv run spoke

Local Qwen3-ASR

Use a Qwen model name to switch the local backend:

SPOKE_WHISPER_MODEL=Qwen/Qwen3-ASR-0.6B uv run spoke

Remote sidecar

Point spoke at any OpenAI-compatible transcription server:

SPOKE_WHISPER_URL=http://<host>:8000 uv run spoke

Example sidecar on Apple Silicon (TTS + STT):

./scripts/setup-mlx-audio-server.sh --start --port 9001

This installs mlx-audio as a uv tool with all runtime deps patched (the published extras are missing several transitive dependencies — see the script for details). Once running, load models dynamically:

curl -X POST "http://localhost:9001/v1/models?model_name=mlx-community/Voxtral-4B-TTS-2603-mlx-6bit"
curl -X POST "http://localhost:9001/v1/models?model_name=mlx-community/Kokoro-82M-bf16"

Or start the server manually after install:

mlx_audio.server --host 0.0.0.0 --port 9001 --workers 1

Note: The binary is mlx_audio.server (dots), not mlx-audio-server (dashes).

Voice commands

When SPOKE_COMMAND_URL is set, Shift+Space activates the command pathway instead of dictation. Spoken input is sent to a local LLM (OpenAI-compatible chat completions API) and the streamed response appears in a dedicated overlay.

SPOKE_COMMAND_URL=http://localhost:8001 uv run spoke

Whisper ontology vocabulary repair

Spoke applies a bounded post-transcription repair pass for recurring Epistaxis ontology terms that have already shown up incorrectly in real launch logs. The repair pass now normalizes those hits to accented canonical forms, and the visible overlay/tray tints those ontology words in the same glow-blue family as the rest of the UI. Current observed failure examples include:

Epistaxes, Nepistaxis, Epistexis, in his taxes -> Epístaxis
Epistaxistopos -> Epístaxis tópos
Topoie, topoit, tipos -> tópoi, tópos
an Afro, Afra, Aphro -> anaphorá
Metadose, Metadose II -> metádosis
Uxis, of seizes, Oxygesis, Oxesis, auxesus -> aúxesis
Syllogy, silagee, sueji, Silegy -> syllogé
appless kept says, upper skepticism, Aposcepsis, Episcapsis -> aposképsis
kerigma, kergma, Curigma, Karigma, Charygma, chorigma -> kérygma
epinorthosis, Epin orthosis, Evanorthosis -> epanórthosis
epispokisis, epispokosis -> epispókisis
semi-hostess, semi-oce's, Semion, Semian -> sēmeiōsis, sēmeion
Probolia, Proboli, probly, probaly, probally -> probolé
Autopuise, Autopoises, Otopoiesis -> autopoíesis
ooxisis -> aúxesis
Catastasis -> katástasis
Lysis -> lýsis

Whenever one of these repairs fires, the launch logs keep both the raw and repaired text so the vocabulary list can expand from observed failures instead of invented cases.

Permissions

On first run, macOS will ask for:

Microphone access
Accessibility access

Accessibility must be granted to the app that launches spoke if you run it from a terminal, or to Spoke.app if you run the bundled app.

Configuration

Core environment variables

The env var names use WHISPER for historical reasons — they control all backends, not just Whisper. Preview and final transcription can now be configured independently, and the menu persists those choices across relaunches.

Variable	Default	Description
`SPOKE_WHISPER_URL`	unset	Remote transcription server. When unset, transcription runs locally.
`SPOKE_WHISPER_MODEL`	unset	Legacy single-model override. When set, both preview and final use the same model.
`SPOKE_PREVIEW_MODEL`	`mlx-community/whisper-medium.en-mlx-8bit`	Preview model identifier. Use `Qwen/Qwen3-ASR-0.6B` for local streaming preview, or any menu-listed Whisper variant.
`SPOKE_TRANSCRIPTION_MODEL`	`mlx-community/whisper-large-v3-turbo`	Final transcription model identifier. Use `Qwen/Qwen3-ASR-0.6B` or any menu-listed Whisper variant.
`SPOKE_COMMAND_URL`	unset	OpenAI-compatible OMLX chat endpoint used by the assistant command pathway.
`SPOKE_COMMAND_MODEL`	`qwen3p5-35B-A3B`	Initial assistant model identifier. When the command pathway is enabled, the menu bar persists the selected assistant model across relaunches.
`SPOKE_COMMAND_MODEL_DIR`	`~/.lmstudio/models`	Optional local model inventory scanned to seed extra Assistant menu entries in `org/model` form alongside the server-reported `/v1/models` list.
`SPOKE_HOLD_MS`	`200`	Spacebar hold threshold in milliseconds. Must be greater than `0`.
`SPOKE_RESTORE_DELAY_MS`	`1000`	Delay before the original pasteboard contents are restored.
`SPOKE_COMMAND_URL`	unset	Local LLM server for voice commands (Shift+Space). Chat completions endpoint.
`SPOKE_COMMAND_MODEL`	`qwen3p5-35B-A3B`	Model name sent in command requests.

Command and Gmail

Variable	Default	Description
`SPOKE_COMMAND_URL`	unset	Enables the spoken-command pathway when set to an OpenAI-compatible chat completions endpoint.
`SPOKE_COMMAND_MODEL`	`qwen3p5-35B-A3B`	Model id used for the spoken-command pathway.
`SPOKE_COMMAND_API_KEY`	unset	Optional bearer token for the command endpoint.
`SPOKE_GMAIL_CREDENTIALS_PATH`	`~/Library/Application Support/Spoke/gmail_credentials.json`	Local JSON file containing Gmail OAuth material for the bounded read-only `query_gmail` tool.
`SPOKE_GMAIL_CLIENT_ID`	unset	Optional env override for the Gmail OAuth client id.
`SPOKE_GMAIL_CLIENT_SECRET`	unset	Optional env override for the Gmail OAuth client secret.
`SPOKE_GMAIL_REFRESH_TOKEN`	unset	Optional env override for the Gmail OAuth refresh token.
`SPOKE_GMAIL_TOKEN_URI`	`https://oauth2.googleapis.com/token`	Optional token endpoint override for Gmail OAuth refresh requests.

The first Gmail affordance is intentionally narrow and read-only: query_gmail currently supports recent starred recruiter- or CTO-style mail and returns compact metadata plus snippets rather than full message bodies.

UI tuning

The overlay and glow also expose advanced tuning env vars such as SPOKE_GLOW_MULTIPLIER, SPOKE_TEXT_ALPHA_MIN, and related SPOKE_* values in the overlay/glow modules.

Development

Run the test suite:

uv run pytest -v

Each layer is independent and testable in isolation.

spoke/
├── __main__.py               # app delegate and runtime wiring
├── input_tap.py              # global spacebar hold detection
├── capture.py                # sounddevice recording and WAV encoding
├── transcribe.py             # remote OpenAI-compatible client
├── transcribe_local.py       # local MLX Whisper backend
├── transcribe_qwen.py        # local Qwen3-ASR backend
├── patch_qwen3_streaming.py  # upstream Qwen3-ASR overlap fix
├── dedup.py                  # decoder-loop and hallucination cleanup
├── inject.py                 # pasteboard save/paste/restore
├── paste_verify.py           # post-paste OCR verification
├── focus_check.py            # text-field focus detection via Accessibility
├── command.py                # voice command dispatch to local LLM
├── command_overlay.py        # streaming command response overlay
├── glow.py                   # screen-edge amplitude glow
├── overlay.py                # live transcription overlay
└── menubar.py                # status item and menu

Build

Build the macOS app bundle with PyInstaller:

./scripts/build.sh

Fast incremental rebuild:

./scripts/build.sh --fast

Create a DMG after building the app:

brew install create-dmg
./scripts/build-dmg.sh

The app bundle is written to dist/Spoke.app.

Notes

The bundled app logs to ~/Library/Logs/Spoke.log.
The local MLX backends may download model weights on first use.
The app is designed for Apple Silicon-oriented local inference workflows, but remote sidecar mode works independently of local model availability.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 796 Commits
.claude/worktrees/binary-finding-flute		.claude/worktrees/binary-finding-flute
assets		assets
bench		bench
docs		docs
scripts		scripts
spoke		spoke
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
.spoke-smoke-env		.spoke-smoke-env
AGENTS.md		AGENTS.md
BACKLOG.md		BACKLOG.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
README.md		README.md
Spoke.spec		Spoke.spec
bench_decode_sync.py		bench_decode_sync.py
bench_results_MacBook-Pro-2.local.json		bench_results_MacBook-Pro-2.local.json
entry_point.py		entry_point.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spoke

How It Works

Features

Requirements

Install

Run

Default: local MLX Whisper

Local Qwen3-ASR

Remote sidecar

Voice commands

Whisper ontology vocabulary repair

Permissions

Configuration

Core environment variables

Command and Gmail

UI tuning

Development

Build

Notes

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spoke

How It Works

Features

Requirements

Install

Run

Default: local MLX Whisper

Local Qwen3-ASR

Remote sidecar

Voice commands

Whisper ontology vocabulary repair

Permissions

Configuration

Core environment variables

Command and Gmail

UI tuning

Development

Build

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages