Skip to content

SystemicVoid/parakeet-stt

Repository files navigation

Parakeet STT

Parakeet STT is a local, low-latency speech-to-text stack for Linux/Wayland. It has two runtime components:

  • parakeet-stt-daemon (Python/FastAPI): captures audio and runs NeMo Parakeet ASR.
  • parakeet-ptt (Rust): global hotkey client, daemon WebSocket client, and text injector.

Prerequisites

  • Python 3.11+ (uv)
  • Rust toolchain (cargo)
  • Linux (Wayland/X11 compatible input stack)
  • NVIDIA GPU optional for lower latency; CPU path works for development/testing

Quick Start

  1. Install daemon dependencies:
cd parakeet-stt-daemon
uv sync --dev

Optional CUDA/nightly inference extras:

uv sync --dev --extra inference --prerelease allow \
  --index https://download.pytorch.org/whl/nightly/cu130 \
  --index-strategy unsafe-best-match
  1. Start via helper (recommended):
# Optional machine-local overrides belong in shell env or an ignored file
# such as .parakeet-stt.local.env.
source scripts/stt-helper.sh
stt start
# Offline/no-overlay profile:
stt off
  1. Inspect runtime:
stt status
stt show
stt logs both
  1. Stop:
stt stop

Manual two-terminal start is still supported:

# Terminal A
cd parakeet-stt-daemon
PARAKEET_STREAMING_ENABLED=true uv run parakeet-stt-daemon

# Terminal B
cd parakeet-ptt
cargo run --release -- --endpoint ws://127.0.0.1:8765/ws --overlay-enabled true

Use PARAKEET_STREAMING_ENABLED=false plus --overlay-enabled false to mirror stt off.

Helper Profiles (stt start / stt off)

Default stt / stt start profile:

  • --injection-mode paste
  • --paste-backend-failure-policy copy-only
  • --uinput-dwell-ms 18
  • PARAKEET_STREAMING_ENABLED=true
  • PARAKEET_OVERLAY_ENABLED=true
  • --overlay-adaptive-width=false
  • Adaptive routing: Terminal → Ctrl+Shift+V, General → Ctrl+V, Unknown → Ctrl+Shift+V
  • Wayland focus cache: 30s stale threshold, 500ms transition grace
  • Clipboard: foreground wl-copy, 700ms post-chord hold, text/plain;charset=utf-8

stt off profile switches to offline/no-overlay defaults:

  • PARAKEET_STREAMING_ENABLED=false
  • PARAKEET_OVERLAY_ENABLED=false

Overlay backend mode override (both profiles):

  • PARAKEET_OVERLAY_MODE=auto|layer-shell|fallback-window|disabled

Local LLM query mode overrides:

  • PARAKEET_LLM_BASE_URL, PARAKEET_LLM_MODEL, PARAKEET_LLM_TIMEOUT_SECONDS
  • PARAKEET_LLM_MAX_TOKENS, PARAKEET_LLM_TEMPERATURE
  • PARAKEET_LLM_SYSTEM_PROMPT, PARAKEET_LLM_OVERLAY_STREAM
  • Managed local llama-server knobs for stt llm: PARAKEET_LLM_SERVER_MODEL_PATH, PARAKEET_LLM_SERVER_MODEL_ALIAS, PARAKEET_LLM_SERVER_HOST, PARAKEET_LLM_SERVER_PORT, PARAKEET_LLM_SERVER_CTX_SIZE, PARAKEET_LLM_SERVER_GPU_LAYERS, PARAKEET_LLM_SERVER_PARALLEL, PARAKEET_LLM_SERVER_METRICS, PARAKEET_LLM_SERVER_EXTRA_ARGS
  • Keep workstation-specific LLM endpoints or launcher details in shell env or an ignored repo-local file such as .parakeet-stt.local.env, not in tracked docs/config.

Helper readiness timing:

  • PARAKEET_CLIENT_READY_TIMEOUT_SECONDS controls client readiness wait (default 30)
  • helper prefers a compatible prebuilt target/release/parakeet-ptt binary by default
  • helper extends readiness wait when it falls back to cargo run --release and compile activity is detected

COSMIC focus-navigation baseline for best adaptive behavior:

  • Focus follows cursor = ON
  • Focus follows cursor delay = 0ms
  • Cursor follows focus = ON

Primary helper commands:

  • stt start|restart|stop|status
  • stt llm [start|stop|restart|status|logs|show] (managed llama-server + STT)
  • stt show (attach tmux)
  • stt logs [client|daemon|both]
  • stt check (daemon health)
  • stt diag-injector (injection diagnostics)
  • stt help, stt help start, and stt help llm (full helper reference)

stt start flag parsing/help/runtime args are driven by a single metadata table in scripts/stt-helper.sh (start_option_rows).

Testing and Validation

Client checks:

cd parakeet-ptt
cargo fmt
cargo clippy --all-targets --all-features -- -D warnings
cargo test

Local hardware-optimized release build (Zen 5 / local machine only):

PARAKEET_PTT_RUSTFLAGS="-C target-cpu=znver5" just build

After that build, stt start will prefer the compatible prebuilt target/release/parakeet-ptt binary instead of recompiling on launch.

If you need the portable host-default build path instead:

just build

Daemon checks:

cd parakeet-stt-daemon
uv run ruff check .
uv run ruff format --check .
ty check --project . --error-on-warning
uv run deptry .
uv run parakeet-stt-daemon --check

Offline benchmark harness (repeatable local regression gate):

cd parakeet-stt-daemon
uv run python check_model.py \
  --bench-offline \
  --device cuda \
  --bench-output bench_audio/latest-benchmark.json \
  --max-avg-wer 0.45 \
  --max-p95-infer-ms 1800 \
  --max-p95-finalize-ms 2200

Local benchmark workflow (git-ignored assets):

# Run-only flow on existing unified corpus (local manifest + appended legacy 8 samples):
just eval                  # same as: just eval compare
just eval offline
just eval stream
just eval compare
just eval calibrate-offline
just eval calibrate-stream

# Dataset maintenance (only when intentionally refreshing prompts/audio):
just eval-dataset candidates
# Review parakeet-stt-daemon/bench_audio/personal/candidates.tsv and set include=yes rows.
just eval-dataset materialize
just eval-dataset record

Each eval or calibration run also copies the generated JSON into parakeet-stt-daemon/bench_audio/personal/history/ with a timestamped filename, so the current latest-*.json files stay convenient while older runs remain easy to compare.

The benchmark command prints a per-sample + aggregate summary to stdout and writes JSON with:

  • benchmark, bench_runtime, model, requested_device, effective_device
  • bench_dir, manifest_path|transcripts_path, bench_tier, bench_append_legacy, bench_runs, sample_count
  • aggregate.avg_wer, aggregate.weighted_wer, aggregate.command_exact_match_rate, aggregate.critical_token_recall
  • aggregate.punctuation.{precision,recall,f1,terminal_accuracy}, aggregate.punctuation_f1, aggregate.terminal_punctuation_accuracy
  • aggregate.infer_ms.*, aggregate.finalize_ms.*, aggregate.warm_finalize_ms.*, aggregate.cold_start_ms
  • thresholds.*, regression_gate.pass, regression_gate.failures
  • samples[] entries including sample_id, tier, domain, critical_tokens, reference, hypothesis, wer, infer_ms, finalize_ms
  • runs[] with per-run aggregates and per-sample rows when --bench-runs > 1 When any configured threshold is exceeded, the harness exits non-zero.

Commit and push gates (repo root):

prek install -t pre-commit -t pre-push
prek run --all-files
prek run --stage pre-push --all-files

Overlay reliability gates (repo root):

just phase6-contract
just phase6-promotion 3

Hook stages are split for speed:

  • pre-commit: maintenance cadence reminder, ruff format, ruff check, ty check, cargo fmt
  • pre-push: pytest, cargo clippy, cargo test
  • Hooks are language-scoped, so Python checks run only for parakeet-stt-daemon/ changes and Rust checks run only for parakeet-ptt/ changes.

Maintenance audits (warned every 10 commits, non-blocking):

scripts/harness-maintenance.sh check --threshold 10
scripts/harness-maintenance.sh run

run executes deptry and cargo +nightly udeps; install cargo-udeps first with cargo install cargo-udeps.

Manual injector validation:

stt diag-injector

Docs Map

  • Harness engineering playbook: docs/engineering/harness-engineering-playbook.md
  • Protocol contract: docs/SPEC.md
  • Troubleshooting (canonical operator source): docs/stt-troubleshooting.md
  • Historical docs archive index (non-canonical): docs/archive/README.md
  • Historical streaming overlay rollout log: docs/archive/STREAMING-OVERLAY-ROLL-OUT-AND-VERIFICATION-PLAN-2026-03-06.md
  • Historical Gemini architectural review: docs/archive/gemini-architectural-review-2026-03-06.md
  • Historical injector handoff archive (non-canonical): docs/archive/HANDOFF-clipboard-injector-2026-02-08.md
  • Historical cross-surface incident handoff archive (non-canonical): docs/archive/HANDOFF-stt-cross-surface-injection-2026-02-19.md
  • Historical injection implementation roadmap (non-canonical): docs/archive/STT-INPUT-INJECTION-ROADMAP-2026-02.md
  • UX roadmap (new): ROADMAP.md

License

MIT (see LICENSE).

About

Local-first Wayland speech-to-text.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors