Linux push-to-talk dictation: hold a key to record, transcribe with OpenAI, and inject text into the focused app.
This repo starts with a minimal CLI. Main flow is push-to-talk, with extra modes for:
- one-shot dictation (fixed duration)
- looped dictation in chunks (record N seconds, transcribe, inject, repeat)
- injection via uinput (ydotool) with clipboard+paste fallback for Unicode
System packages (Ubuntu LTS example):
sudo apt update
sudo apt install pipewire-bin wl-clipboard ydotool curl python3 python3-evdevNote: on some Ubuntu based distributions, ydotoold is missing from the packaged ydotool. If so, build from source:
sudo apt install git build-essential cmake scdoc libevdev-dev libudev-dev
git clone https://github.com/ReimuNotMoe/ydotool.git
cd ydotool
mkdir -p build
cd build
cmake ..
make -j"$(nproc)"
sudo make installpython3-evdev is required for push-to-talk.
You also need access to /dev/uinput for ydotool. A common setup:
sudo modprobe uinput
echo uinput | sudo tee /etc/modules-load.d/uinput.conf
echo 'KERNEL=="uinput", MODE="0660", GROUP="input", OPTIONS+="static_node=uinput"' | sudo tee /etc/udev/rules.d/99-uinput.rules
sudo udevadm control --reload-rules
sudo udevadm trigger
sudo usermod -aG input "$USER"Log out and back in after adding the group.
Option A (recommended): keep the repo and add a symlink into your PATH.
git clone https://github.com/Poor-Plebs/whisperer.git
cd whisperer
mkdir -p ~/.local/bin
ln -sf "$(pwd)/bin/whisperer" ~/.local/bin/whispererOption B: add the repo bin/ folder to your PATH in your shell profile.
export PATH="$HOME/path/to/whisperer/bin:$PATH"Install systemd user service (manages ydotoold) and config placement:
whisperer --install-serviceAfter running --install-service, add your OpenAI API key in:
~/.config/whisperer/configPush-to-talk (default, hold key to record; default key: Right Alt):
whispererRun one-shot dictation (fixed duration):
whisperer --duration 8Run chunked dictation (keeps recording 4s chunks until Ctrl+C):
whisperer --loop --chunk-seconds 4If Right Alt is used for AltGr on your layout, pick another key:
whisperer --ptt-key KEY_F9Some layouts report Right Alt as KEY_ALTGR instead of KEY_RIGHTALT:
whisperer --ptt-key KEY_ALTGRList keyboard devices (for --ptt-device):
whisperer --ptt-list-devicesPrefer a specific device by name fragment:
whisperer --ptt-device-match SONiXShow status and key events:
whisperer --debug- Injection strategy:
- If transcript is plain ASCII, type it via ydotool.
- Otherwise, copy to clipboard and paste via Ctrl+V.
- Config file:
~/.config/whisperer/config - CLI flags override config file values.
- The script will auto-start
ydotooldunlessWHISPERER_START_DAEMON=0.- Set
WHISPERER_START_DAEMON=0if you manageydotooldvia systemd or run it manually.
- Set
See all options:
whisperer --help-advancedOptions are grouped by context (install/maintenance, dictation, audio/gating/debug).
Quick diagnostics:
whisperer --doctorAuto-fix common issues:
whisperer --doctor --fixPaste not working in terminals?
- Many terminals use Ctrl+Shift+V for paste. Set:
WHISPERER_PASTE_KEYS=ctrl+shift+v
Debug output:
WHISPERER_DEBUG=1or--debugshows status and key events.
Cleanup:
whisperer --uninstallRMS is a simple average loudness measure (normalized 0.0–1.0). The RMS gate only runs when --rms-threshold is set > 0 (otherwise no sampling happens).
Typical values:
- ~0.000–0.005: near silence
- ~0.005–0.02: quiet speech / ambient noise
- ~0.02–0.08: normal speech
- ~0.08–0.2+: loud speech / music
Environment variables (also supported in config file):
OPENAI_API_KEY(required)WHISPERER_MODEL(default:whisper-1). Other options:gpt-4o-mini-transcribe,gpt-4o-transcribe.WHISPERER_RATE(default:16000)WHISPERER_CHANNELS(default:1)WHISPERER_MODE(auto|type|paste, default:auto)WHISPERER_START_DAEMON(1to auto-start ydotoold, default:1)WHISPERER_SOCKET(default:/tmp/.ydotool_socket)WHISPERER_PASTE_KEYS(ctrl+vorctrl+shift+v, default:ctrl+v)WHISPERER_NO_INJECT(1to disable injection, default:0)WHISPERER_RMS_THRESHOLD(default:0, disabled)WHISPERER_PRINT_RMS(1to print RMS, default:0)WHISPERER_RMS_SAMPLE_SECONDS(default:0.5)WHISPERER_DEBUG(1to enable debug output, default:0)WHISPERER_PTT_KEY(default:KEY_RIGHTALT)WHISPERER_PTT_DEVICE(default: empty = auto-detect; set to a/dev/input/by-id/*-event-kbdpath fromwhisperer --ptt-list-devices. The literal stringauto-detectis not special and will be treated as a path.)WHISPERER_PTT_DEVICE_MATCH(default: empty; plain substring match on device path, not regex. Only used whenWHISPERER_PTT_DEVICEis empty. When empty, no filter is applied and the best keyboard device is auto-detected.)
Run whisperer --help for common options and whisperer --help-advanced for all options.