Parakeet STT is a local, low-latency speech-to-text stack for Linux/Wayland. It has two runtime components:
parakeet-stt-daemon(Python/FastAPI): captures audio and runs NeMo Parakeet ASR.parakeet-ptt(Rust): global hotkey client, daemon WebSocket client, and text injector.
- Python 3.11+ (
uv) - Rust toolchain (
cargo) - Linux (Wayland/X11 compatible input stack)
- NVIDIA GPU optional for lower latency; CPU path works for development/testing
- Install daemon dependencies:
cd parakeet-stt-daemon
uv sync --devOptional CUDA/nightly inference extras:
uv sync --dev --extra inference --prerelease allow \
--index https://download.pytorch.org/whl/nightly/cu130 \
--index-strategy unsafe-best-match- Start via helper (recommended):
# Optional machine-local overrides belong in shell env or an ignored file
# such as .parakeet-stt.local.env.
source scripts/stt-helper.sh
stt start
# Offline/no-overlay profile:
stt off- Inspect runtime:
stt status
stt show
stt logs both- Stop:
stt stopManual two-terminal start is still supported:
# Terminal A
cd parakeet-stt-daemon
PARAKEET_STREAMING_ENABLED=true uv run parakeet-stt-daemon
# Terminal B
cd parakeet-ptt
cargo run --release -- --endpoint ws://127.0.0.1:8765/ws --overlay-enabled trueUse PARAKEET_STREAMING_ENABLED=false plus --overlay-enabled false to mirror stt off.
Default stt / stt start profile:
--injection-mode paste--paste-backend-failure-policy copy-only--uinput-dwell-ms 18PARAKEET_STREAMING_ENABLED=truePARAKEET_OVERLAY_ENABLED=true--overlay-adaptive-width=false- Adaptive routing: Terminal → Ctrl+Shift+V, General → Ctrl+V, Unknown → Ctrl+Shift+V
- Wayland focus cache: 30s stale threshold, 500ms transition grace
- Clipboard: foreground wl-copy, 700ms post-chord hold,
text/plain;charset=utf-8
stt off profile switches to offline/no-overlay defaults:
PARAKEET_STREAMING_ENABLED=falsePARAKEET_OVERLAY_ENABLED=false
Overlay backend mode override (both profiles):
PARAKEET_OVERLAY_MODE=auto|layer-shell|fallback-window|disabled
Local LLM query mode overrides:
PARAKEET_LLM_BASE_URL,PARAKEET_LLM_MODEL,PARAKEET_LLM_TIMEOUT_SECONDSPARAKEET_LLM_MAX_TOKENS,PARAKEET_LLM_TEMPERATUREPARAKEET_LLM_SYSTEM_PROMPT,PARAKEET_LLM_OVERLAY_STREAM- Managed local llama-server knobs for
stt llm:PARAKEET_LLM_SERVER_MODEL_PATH,PARAKEET_LLM_SERVER_MODEL_ALIAS,PARAKEET_LLM_SERVER_HOST,PARAKEET_LLM_SERVER_PORT,PARAKEET_LLM_SERVER_CTX_SIZE,PARAKEET_LLM_SERVER_GPU_LAYERS,PARAKEET_LLM_SERVER_PARALLEL,PARAKEET_LLM_SERVER_METRICS,PARAKEET_LLM_SERVER_EXTRA_ARGS - Keep workstation-specific LLM endpoints or launcher details in shell env or an ignored repo-local file such as
.parakeet-stt.local.env, not in tracked docs/config.
Helper readiness timing:
PARAKEET_CLIENT_READY_TIMEOUT_SECONDScontrols client readiness wait (default30)- helper prefers a compatible prebuilt
target/release/parakeet-pttbinary by default - helper extends readiness wait when it falls back to
cargo run --releaseand compile activity is detected
COSMIC focus-navigation baseline for best adaptive behavior:
Focus follows cursor = ONFocus follows cursor delay = 0msCursor follows focus = ON
Primary helper commands:
stt start|restart|stop|statusstt llm [start|stop|restart|status|logs|show](managed llama-server + STT)stt show(attach tmux)stt logs [client|daemon|both]stt check(daemon health)stt diag-injector(injection diagnostics)stt help,stt help start, andstt help llm(full helper reference)
stt start flag parsing/help/runtime args are driven by a single metadata table in
scripts/stt-helper.sh (start_option_rows).
Client checks:
cd parakeet-ptt
cargo fmt
cargo clippy --all-targets --all-features -- -D warnings
cargo testLocal hardware-optimized release build (Zen 5 / local machine only):
PARAKEET_PTT_RUSTFLAGS="-C target-cpu=znver5" just buildAfter that build, stt start will prefer the compatible prebuilt target/release/parakeet-ptt
binary instead of recompiling on launch.
If you need the portable host-default build path instead:
just buildDaemon checks:
cd parakeet-stt-daemon
uv run ruff check .
uv run ruff format --check .
ty check --project . --error-on-warning
uv run deptry .
uv run parakeet-stt-daemon --checkOffline benchmark harness (repeatable local regression gate):
cd parakeet-stt-daemon
uv run python check_model.py \
--bench-offline \
--device cuda \
--bench-output bench_audio/latest-benchmark.json \
--max-avg-wer 0.45 \
--max-p95-infer-ms 1800 \
--max-p95-finalize-ms 2200Local benchmark workflow (git-ignored assets):
# Run-only flow on existing unified corpus (local manifest + appended legacy 8 samples):
just eval # same as: just eval compare
just eval offline
just eval stream
just eval compare
just eval calibrate-offline
just eval calibrate-stream
# Dataset maintenance (only when intentionally refreshing prompts/audio):
just eval-dataset candidates
# Review parakeet-stt-daemon/bench_audio/personal/candidates.tsv and set include=yes rows.
just eval-dataset materialize
just eval-dataset recordEach eval or calibration run also copies the generated JSON into
parakeet-stt-daemon/bench_audio/personal/history/ with a timestamped filename, so the
current latest-*.json files stay convenient while older runs remain easy to compare.
The benchmark command prints a per-sample + aggregate summary to stdout and writes JSON with:
benchmark,bench_runtime,model,requested_device,effective_devicebench_dir,manifest_path|transcripts_path,bench_tier,bench_append_legacy,bench_runs,sample_countaggregate.avg_wer,aggregate.weighted_wer,aggregate.command_exact_match_rate,aggregate.critical_token_recallaggregate.punctuation.{precision,recall,f1,terminal_accuracy},aggregate.punctuation_f1,aggregate.terminal_punctuation_accuracyaggregate.infer_ms.*,aggregate.finalize_ms.*,aggregate.warm_finalize_ms.*,aggregate.cold_start_msthresholds.*,regression_gate.pass,regression_gate.failuressamples[]entries includingsample_id,tier,domain,critical_tokens,reference,hypothesis,wer,infer_ms,finalize_msruns[]with per-run aggregates and per-sample rows when--bench-runs > 1When any configured threshold is exceeded, the harness exits non-zero.
Commit and push gates (repo root):
prek install -t pre-commit -t pre-push
prek run --all-files
prek run --stage pre-push --all-filesOverlay reliability gates (repo root):
just phase6-contract
just phase6-promotion 3Hook stages are split for speed:
pre-commit: maintenance cadence reminder,ruff format,ruff check,ty check,cargo fmtpre-push:pytest,cargo clippy,cargo test- Hooks are language-scoped, so Python checks run only for
parakeet-stt-daemon/changes and Rust checks run only forparakeet-ptt/changes.
Maintenance audits (warned every 10 commits, non-blocking):
scripts/harness-maintenance.sh check --threshold 10
scripts/harness-maintenance.sh runrun executes deptry and cargo +nightly udeps; install cargo-udeps first with cargo install cargo-udeps.
Manual injector validation:
stt diag-injector- Harness engineering playbook:
docs/engineering/harness-engineering-playbook.md - Protocol contract:
docs/SPEC.md - Troubleshooting (canonical operator source):
docs/stt-troubleshooting.md - Historical docs archive index (non-canonical):
docs/archive/README.md - Historical streaming overlay rollout log:
docs/archive/STREAMING-OVERLAY-ROLL-OUT-AND-VERIFICATION-PLAN-2026-03-06.md - Historical Gemini architectural review:
docs/archive/gemini-architectural-review-2026-03-06.md - Historical injector handoff archive (non-canonical):
docs/archive/HANDOFF-clipboard-injector-2026-02-08.md - Historical cross-surface incident handoff archive (non-canonical):
docs/archive/HANDOFF-stt-cross-surface-injection-2026-02-19.md - Historical injection implementation roadmap (non-canonical):
docs/archive/STT-INPUT-INJECTION-ROADMAP-2026-02.md - UX roadmap (new):
ROADMAP.md
MIT (see LICENSE).