Jupiter Voice

A fully local, open-source voice assistant for OpenClaw AI agents on Apple Silicon. Talk to your AI agent with natural speech and hear spoken responses — the entire audio pipeline runs locally at zero cost.

Demo

You: "Hey Jarvis"                          [wake word activates]
     "What's the weather like today?"      [speech transcribed live]
     "Sudo out"                            [query sent to AI]

Jupiter: "It's currently 72 degrees and    [AI responds via speaker]
          sunny in your area..."

Why Jupiter Voice?

Completely free — No API keys for speech. The only cost is your AI model's token usage.
Fully local — Wake word detection, speech-to-text, and text-to-speech all run on your Mac. Nothing leaves your machine (except the query to your AI agent).
Apple Silicon optimized — Uses MLX for 10x faster Whisper transcription on M-series chips.
Modular — Swap wake words, STT models, TTS voices, or AI backends via config.
Hackable — Clean Python codebase, MIT licensed, easy to extend.

How It Works

Microphone
    │
    ▼
┌──────────────┐     ┌─────────────┐     ┌──────────┐     ┌──────────┐
│  Wake Word   │────▶│  Speech to  │────▶│ OpenClaw │────▶│ Text to  │──▶ Speaker
│  Detection   │     │    Text     │     │    AI    │     │  Speech  │
│ (OpenWakeWord)     │  (Whisper)  │     │  Agent   │     │ (Kokoro) │
└──────────────┘     └─────────────┘     └──────────┘     └──────────┘
     "hey jarvis"     Records until       Processes        Speaks the
     activates        "sudo out"          your query       response

State machine: IDLE → LISTENING → PROCESSING → SPEAKING → IDLE

See docs/architecture.md for a deep dive.

Requirements

macOS on Apple Silicon (M1, M2, M3, M4, or later)
Homebrew — package manager for macOS
OpenClaw — the AI agent that answers your queries
~1.5 GB of free disk space for AI models

The setup script handles everything else (Python, Rust, espeak-ng, all AI models).

Quick Start

# 1. Clone the repository
git clone https://github.com/SudarshanSuryaprakash/voice-assistant.git
cd voice-assistant

# 2. Run setup (installs deps, downloads ~1.1GB of AI models)
bash setup.sh

# 3. Activate the environment
source .venv/bin/activate

# 4. Start the voice assistant
jupiter-voice

On first run, the setup script will:

Install Python 3.12, Rust, and espeak-ng via Homebrew (if missing)
Create a Python virtual environment
Install all Python dependencies
Download the Whisper STT model (~750MB), Kokoro TTS model (~310MB), voice data (~27MB), and OpenWakeWord models (~10MB)
Generate audio cue sounds

Usage

Say "hey jarvis" (the default wake phrase) — you'll hear a chime
Speak your question or command naturally
Say "sudo out" to send your query to the AI
Listen to the spoken response
Repeat!

Tip: You can train a custom "hey jupiter" (or any other phrase) wake word. See Custom Wake Word below.

Configuration

All settings live in config.yaml. Here are the key options:

# Wake word detection
wake:
  model: "hey_jarvis"         # Built-in model, or path to custom .onnx file
  threshold: 0.5               # Sensitivity: 0.0 (loose) to 1.0 (strict)

# Speech-to-text (Whisper)
stt:
  model: "distil-medium.en"   # Options: tiny.en, small.en, medium, distil-medium.en, large-v3
  chunk_duration: 3.0          # Seconds of audio per transcription chunk

# Text-to-speech (Kokoro)
tts:
  voice: "af_heart"            # See docs/voices.md for all 14 available voices
  speed: 1.0                   # 0.5 = slow, 1.0 = normal, 2.0 = fast
  fallback_to_macos_say: true  # Use macOS `say` if Kokoro fails

# Close phrase (what you say to send the query)
close_phrase:
  primary: "sudo out"

# OpenClaw AI agent
gateway:
  session_id: "agent:main:main"  # Your OpenClaw session ID
  timeout: 120                    # Max wait for AI response (seconds)

Every config value can be overridden with an environment variable:

Variable	Overrides
`JUPITER_VOICE_SESSION_ID`	`gateway.session_id`
`JUPITER_VOICE_STT_MODEL`	`stt.model`
`JUPITER_VOICE_TTS_VOICE`	`tts.voice`
`JUPITER_VOICE_WAKE_MODEL`	`wake.model`
`JUPITER_VOICE_WAKE_THRESHOLD`	`wake.threshold`
`JUPITER_VOICE_LOG_LEVEL`	`logging.level`

Custom Wake Word

The default wake word is "hey jarvis" (a model bundled with OpenWakeWord). You can train any custom phrase — like "hey jupiter", "ok computer", or anything else.

Training via Google Colab (recommended)

python scripts/train_wake_word.py --method colab

This prints step-by-step instructions for the OpenWakeWord training notebook. Training takes ~45 minutes and requires no manual voice recordings (it generates synthetic speech automatically).

After training, drop the .onnx file into assets/ and update your config:

wake:
  model: "assets/hey_jupiter.onnx"

Project Structure

voice-assistant/
├── jupiter_voice/           # Main Python package
│   ├── cli.py               # Entry point and main orchestrator
│   ├── config.py            # YAML + env config loader
│   ├── state_machine.py     # IDLE/LISTENING/PROCESSING/SPEAKING FSM
│   ├── audio/               # Microphone capture, speaker playback, audio cues
│   ├── stt/                 # Whisper MLX transcription + "sudo out" detection
│   ├── tts/                 # Kokoro TTS synthesis + macOS fallback
│   ├── wake/                # OpenWakeWord integration
│   ├── gateway/             # OpenClaw CLI client
│   └── utils/               # Health checks
├── tests/                   # Unit tests (pytest)
├── scripts/
│   ├── generate_cues.py     # Generate audio cue WAV files
│   └── train_wake_word.py   # Wake word training helper
├── config.yaml              # Configuration file
├── setup.sh                 # One-command setup script
├── models/                  # Kokoro TTS models (downloaded by setup.sh)
├── mlx_models/              # Whisper STT models (downloaded by setup.sh)
└── assets/                  # Audio cues + custom wake word models

Troubleshooting

"No microphone detected"

macOS permissions: Go to System Settings > Privacy & Security > Microphone and ensure your terminal app has access.
No mic hardware: Mac Minis don't have a built-in mic. Connect a USB microphone or headset.
Wrong device: Set a specific device index in config.yaml under audio.device. Run python -c "import sounddevice; print(sounddevice.query_devices())" to list available devices.

"OpenClaw CLI not found"

Install OpenClaw from openclaw.dev.
Make sure openclaw is in your PATH: which openclaw
Test connectivity: openclaw doctor

Wake word not triggering

Speak clearly and at a normal volume.
Lower the threshold in config.yaml: wake.threshold: 0.3
Ensure you're using the right phrase for your model ("hey jarvis" for the default).

"sudo out" not detected

Whisper sometimes misrecognizes it. The system fuzzy-matches variants like "pseudo out" and "sue do out" automatically.
Speak it clearly with a brief pause before.
You can add more alternatives in config.yaml under close_phrase.alternatives.

TTS sounds robotic / uses macOS voice

This means Kokoro TTS failed and it fell back to macOS say.
Run bash setup.sh to ensure all models are downloaded.
Check that espeak-ng is installed: brew install espeak-ng

Whisper transcription is slow

Try a smaller model: set stt.model: "tiny.en" in config.yaml (faster but less accurate).
The default distil-medium.en is a good balance. large-v3 is the most accurate but slowest.
Ensure you're on Apple Silicon — Intel Macs won't get MLX acceleration.

Response takes a long time

This depends on your AI model. The voice pipeline itself is fast; the bottleneck is the LLM response time.
Increase gateway.timeout in config.yaml if responses are being cut off.

Development

# Install with dev dependencies
source .venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest -v

# Lint
ruff check jupiter_voice/ tests/ scripts/

# Run with debug logging
jupiter-voice -v

See docs/architecture.md for a walkthrough of the codebase.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Areas where help is especially appreciated:

Linux support (PulseAudio/PipeWire audio capture)
Windows support
Additional TTS engine backends
Wake word model sharing (pre-trained .onnx files for common phrases)
Improved VAD (voice activity detection) for more natural close-phrase detection

Tech Stack

Component	Library	Size	Purpose
Wake Word	OpenWakeWord	~10MB	Always-on keyword detection
Speech-to-Text	Lightning Whisper MLX	~750MB	Audio transcription (Apple Silicon optimized)
Text-to-Speech	Kokoro ONNX	~340MB	Natural voice synthesis
Audio I/O	sounddevice	—	Microphone capture and speaker output
AI Backend	OpenClaw	—	AI agent that processes your queries
Terminal UI	Rich	—	Startup banner, progress indicators

License

MIT — use it however you want.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jupiter Voice

Demo

Why Jupiter Voice?

How It Works

Requirements

Quick Start

Usage

Configuration

Custom Wake Word

Training via Google Colab (recommended)

Project Structure

Troubleshooting

"No microphone detected"

"OpenClaw CLI not found"

Wake word not triggering

"sudo out" not detected

TTS sounds robotic / uses macOS voice

Whisper transcription is slow

Response takes a long time

Development

Contributing

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
docs		docs
jupiter_voice		jupiter_voice
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Jupiter Voice

Demo

Why Jupiter Voice?

How It Works

Requirements

Quick Start

Usage

Configuration

Custom Wake Word

Training via Google Colab (recommended)

Project Structure

Troubleshooting

"No microphone detected"

"OpenClaw CLI not found"

Wake word not triggering

"sudo out" not detected

TTS sounds robotic / uses macOS voice

Whisper transcription is slow

Response takes a long time

Development

Contributing

Tech Stack

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages