A Linux-based desktop utility for voice-to-text input, specialized for AI prompting workflows.
Author: Lancelot MEI
English | 中文版
Important
Development Status and Platform Restrictions:
- Currently only tested on Ubuntu + Wayland environments.
- This program is a development tool for private/guest use. It has not undergone extensive cross-distribution and cross-protocol (X11) testing.
- The transcription accuracy is currently lower than some commercial offline modes (e.g., iFlytek).
- The punctuation logic is rudimentary and may result in redundant periods.
VoxQuill provides a floating interface to capture voice input and sync the transcribed text into other application windows.
- Global Hotkey Trigger: Summon a floating text edit box across different Linux desktop environments using shortcuts.
- Voice-to-Text with Manual Refinement:
- Auto-Recording: Features built-in Voice Activity Detection (VAD) to start recording immediately upon being summoned. Supports
sensevoice small(multilingual: ZH, EN, JA, KO). - Manual Editing: Transcription results are displayed in the input box for manual refinement before pasting.
- Auto-Recording: Features built-in Voice Activity Detection (VAD) to start recording immediately upon being summoned. Supports
- Automated Text Injection:
- Submit Shortcut (
Ctrl+Enter): Copies the current editor contents to the clipboard, returns focus to the previously active window, and attempts to paste there automatically. - Direct Input Shortcut (
Ctrl+Shift+Enter): Copies the current editor contents to the clipboard, returns focus to the previous window, and tries to type the text as real keystrokes. This path is better suited for terminals and other targets whereCtrl+Vpaste is unreliable. - X11 Environment: Automatic paste falls back to
pynput. Note: X11 support is theoretical and has not been formally tested. - Wayland Environment: Automatic paste tries the XDG Desktop Portal RemoteDesktop path first, then falls back to
wtypeorevdev/uinputonly if portal injection is unavailable or denied. The direct-input path preferswtypefirst and falls back topynputif needed. - Recording Shortcut (
Esc): Toggles recording on and off without submitting text.
- Submit Shortcut (
- AI Prompt Workflow:
- Predefined Templates: Quickly insert preset prompt texts via the UI.
- In-place Expansion: Detects specific command prefixes (e.g.,
//s) and expands them into full prompt templates automatically.
Ensure you have completed the Installation Guide first.
-
Activate Virtual Environment:
source .venv/bin/activate -
Start the Main Process:
python3 main.py
By default on Wayland, VoxQuill now prefers native Wayland Qt mode so focus handoff and portal paste stay in the same windowing model. If you need the older XWayland compatibility mode for troubleshooting, start it with:
VOXQUILL_FORCE_XCB=1 python3 main.py
-
Configure Global Hotkey (Recommended): Map a global shortcut in your Desktop Environment to the following command:
# Use the **absolute path** to your venv python and cli.py /path/to/VoxQuill/.venv/bin/python /path/to/VoxQuill/cli.py --command toggle
The app currently revolves around four main states:
- Hidden: The floating window is not visible, while the tray icon and IPC endpoint stay available.
- VisibleIdle: The floating window is visible and editable, but not recording.
- VisibleRecording: The floating window is visible and actively capturing speech.
- Submitting: A submit action is in progress, including stop-recording cleanup, clipboard sync, hide/focus handoff, paste/type attempt, and text cleanup.
The current shortcut semantics are:
Esc: Only toggles recording on and off.Ctrl+Enter/Ctrl+Return: Submit current text through the clipboard-and-paste path. If recording is active, the app stops recording first and only then continues submission.Ctrl+Shift+Enter/Ctrl+Shift+Return: Submit current text through the direct-typing path, intended for targets where paste is unstable.- Global hotkey bound to
cli.py --command toggle: Toggles recording state rather than window visibility. It appears to "summon and record" because recording start also brings the window to the front. - Close button
×, tray Hide, or IPChide: Hide the floating window without quitting the process.
The minimum success guarantee remains: the submitted text is already in the system clipboard. Paste/type automation is an enhancement layer; if desktop restrictions block it, manual completion is still possible.
Custom behaviors are managed via JSON files in the config/ directory:
config/models.json:- Management of ASR model paths and pipeline parameters.
- History directory configuration (
history_dir). - History toggle (
history_enabled).
config/prompts.json:- Definition of AI prompt templates.
- Command prefix mappings (e.g., mapping
//sto a complex system role).
config/shortcuts.json:- Default and user-editable UI shortcut bindings.
- Shortcut handlers now route through named actions, so future UI-based shortcut editing can reuse the same action registry.
config/ui.json:- Stores appearance preferences for the floating window.
- Current keys:
theme:lightordarkinactive_opacity: opacity used when the window is visible but intentionally not focused
The app now supports two submit modes. Both copy the current text into the system clipboard first:
Ctrl+Enter: submit by pasteCtrl+Shift+Enter: submit by direct typing
Pressing either shortcut triggers the following sequence:
- Stop Recording: If recording is active, the app stops capture first and freezes the final text.
- Clipboard Sync: Copies the current text buffer to the system clipboard.
- Return Focus: The floating editor yields focus back to the previously active application window.
- Local Archiving (History Logging):
- Text is automatically appended to a history file.
- Default directory:
~/Documents/VoxQuill/History(Adjustable inmodels.json). - File format: Monthly Markdown files (e.g.,
2026-03vox.md). - Entry format: ISO timestamps and daily headings to record every entry.
- Submit Execution:
- Paste mode attempts an automatic paste into the target window.
- Direct-input mode attempts keystroke-level text injection into the target window.
- On GNOME/Wayland, paste mode first attempts the XDG Desktop Portal RemoteDesktop path and reuses/restores sessions when possible.
- If automation fails after fallback attempts, the app shows a confirmation dialog and keeps the text in the clipboard for manual completion.
- Clear Input: Clears the floating editor after a non-empty submit.
The Esc key no longer submits text. It now only toggles recording on and off.
Appearance settings are now persisted and applied across launches:
- Open Model Manager (
Ctrl+M). - Choose Light or Dark theme.
- Adjust Inactive opacity to control how visible the floating window remains after submission while it is shown without focus.
- Click Save & Apply to persist the change into
config/ui.jsonand immediately refresh the app stylesheet.
This makes the "return focus but keep the box visible" behavior configurable instead of hard-coded.
- UI Framework: PyQt6
- ASR Engine: Powered by sherpa-onnx (Runs locally/offline)
- Voice Activity Detection: Silero VAD v5 (ONNX Runtime)
- Inter-Process Communication (IPC): JSON-based Unix Domain Sockets
- Audio I/O: PyAudio
- Platform Support: Tested only on Ubuntu + Wayland.
Requires libxcb-cursor0 for correct window positioning and interaction on Wayland.
git clone https://github.com/lancelotmei/VoxQuill.git
cd VoxQuill
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtDownload models via the Model Manager (Ctrl+M) in the UI or run the script:
python3 scripts/download_models.py- Paste Limitation (Wayland): Due to protocol security, auto-paste may behave differently across compositors (GNOME/KDE/Sway). If it fails, use manual paste.
- Window Positioning: Currently unable to accurately track and follow the active cursor position.
If you need a standalone Linux executable, run:
./scripts/build_linux.shGNU GPL v3.0
