Give any AI agent a real-time face and voice.
Open source. Runs on your machine. No GPU required.
NyxClaw runs locally and turns any Claw agent into a talking, listening, lip-syncing avatar on your phone. Audio in β 52 ARKit blendshapes out @ 30 FPS, all on CPU.
- Real-time animation β Wav2Arkit ONNX, 52 ARKit blendshapes @ 30 FPS, CPU-only
- Tool-call fillers β the avatar talks while the AI works, no awkward silence
- Thinking silence β keeps breathing/blinking during processing gaps (up to 5 s)
- Synced transcripts β text lands with the audio via delivery-clock tracking
- Barge-in β interrupt mid-sentence; LLM + TTS + playback cancel in ~128 ms
- Rich content β
{speech, content}splits what the avatar says vs. what the app shows
NyxClaw runs on your machine, right alongside your claw. End-to-end encrypted, cryptographically paired. No cloud. No relay. No telemetry.
| π Ed25519 Auth β device pairing via cryptographic challenge. No passwords on the wire. | π End-to-end WSS β auto-provisioned Cloudflare Tunnel. No port forwarding, no certs. |
| π± QR Pairing β scan to connect. One device at a time. Treat the code like a password. | π Self-hosted β Docker or install script. Your machine, your data, your rules. |
Two reference pipelines out of the box β OpenAI Realtime for cloud-grade voice quality, Local CPU for total privacy. Swap or extend either one.
| π OpenAI Voice | π₯οΈ Local Voice | |
|---|---|---|
| STT | OpenAI Realtime API | faster-whisper + Silero VAD |
| TTS | OpenAI TTS API | Piper VITS ONNX |
| Install | uv sync |
uv sync --extra local_voice |
| Footprint | ~1 GB RAM, 1 core | ~2 GB RAM, 2 cores |
| Privacy | OpenAI sees the audio | Nothing leaves your machine |
Both pipelines run Wav2Arkit ONNX on CPU β 52 ARKit blendshapes at 30 FPS. Β Β π€ Model card
git clone https://github.com/myned-ai/nyxclaw.git
cd nyxclaw
cp .env.example .env
# Edit .env with your backend settings (BASE_URL, AUTH_TOKEN β see Backend Setup below)
docker compose up --build -dOn first boot, NyxClaw downloads models, provisions a Cloudflare Tunnel, and starts serving. Check the logs for your secure URL:
docker compose logs -f nyxclaw
# Tunnel: wss://a3f7b2c1.nyxclaw.ai/wsYour mobile app connects to that wss:// URL β no port forwarding or TLS certs needed.
To enable local voice (Piper TTS + faster-whisper), set INSTALL_LOCAL_VOICE=true in .env
before building.
Installs NyxClaw + Cloudflare Tunnel as system services. Handles uv, cloudflared, model
downloads, tunnel provisioning, and service registration (systemd / launchd / Windows service)
automatically.
# Linux / macOS
./install.sh
# Windows (PowerShell as Administrator)
.\install.ps1NyxClaw supports two Claw backends. Set AGENT_TYPE in .env to switch.
Requires the nyxclaw avatar patch applied to OpenClaw. See claw_patches/openclaw/README.md for full setup (patching, auth, AGENTS.md prompt).
AGENT_TYPE=openclaw
BASE_URL=http://127.0.0.1:18789
AUTH_TOKEN=your-openclaw-gateway-token
USE_AVATAR_ENDPOINT=trueRequires the nyxclaw avatar patch applied to ZeroClaw. See claw_patches/zeroclaw/README.md for full setup (patching, auth, AGENTS.md prompt).
AGENT_TYPE=zeroclaw
BASE_URL=http://127.0.0.1:42617
AUTH_TOKEN=zc_YOUR_TOKEN_HERE
USE_AVATAR_ENDPOINT=trueBoth backends work without the avatar patch β set USE_AVATAR_ENDPOINT=false (the default).
NyxClaw will use the standard /v1/chat/completions (OpenClaw) or /ws/chat (ZeroClaw)
endpoints. Rich content (rich_content messages) won't be available β all LLM output is
treated as speech.
| Claw | Notes |
|---|---|
| OpenClaw | HTTP SSE backend with /v1/chat/completions/avatar |
| ZeroClaw | WebSocket backend with /ws/avatar |
| Your claw next? | Open an issue or email us |
All settings are configured via environment variables or .env file.
See .env.example for the full template.
One session at a time. NyxClaw serves a single active connection β one avatar, one audio stream. You can pair multiple devices (phone, tablet, desktop) for convenience, but only one connects at a time. Treat the setup code like a password β anyone with it can pair a device and talk to your AI agent.
Backend-specific patches that add the avatar endpoint with structured {speech, content}
output and tool call events. When the LLM's response includes content better seen than
heard (URLs, tables, structured data), the patch splits the response:
speechβ avatar speaks a short phrase ("Here's the Wikipedia page, take a look.")contentβ forwarded as arich_contentmessage (markdown) to the client
| Patch | Backend | Endpoint | Docs |
|---|---|---|---|
claw_patches/openclaw/ |
OpenClaw v2026.5.6 | /v1/chat/completions/avatar (HTTP SSE) |
README |
claw_patches/zeroclaw/ |
ZeroClaw v0.7.4 | /ws/avatar (WebSocket) |
README |
Without the patch, all LLM output is treated as speech β no rich_content messages.
NyxClaw auto-provisions a free Cloudflare Tunnel on first boot (wss://<id>.nyxclaw.ai).
This service has limited capacity. You can use any reverse proxy or tunneling solution
instead β NyxClaw just needs something that terminates TLS and forwards traffic to
localhost:8080:
- Cloudflare Tunnel (your own account) β run
cloudflared tunnelwith your own token - Tailscale β encrypted mesh VPN, stable DNS, zero config
- nginx / Caddy β traditional reverse proxy with Let's Encrypt
- ngrok β quick dev tunnels
Set AUTH_SETUP_CODE_URL=wss://your-domain/ws in .env so the QR code contains your
custom URL.
| Component | Memory | Mode |
|---|---|---|
| Python + FastAPI + ONNX Runtime | ~500 MB | Both |
| Wav2Arkit (blendshape inference) | ~200 MB | Both |
| faster-whisper small.en (speech recognition) | ~500 MB | Local only |
| Piper TTS VITS (speech synthesis) | ~100 MB | Local only |
| Silero VAD (voice activity detection) | ~10 MB | Local only |
OpenAI Voice: 1 GB RAM, 1 core minimum. Recommended 1.5 GB, 2 cores. Local Voice: 2 GB RAM, 2 cores minimum. Recommended 3β4 GB, 4 cores (STT, TTS, and blendshapes run concurrently during barge-in).
Client β Server:
| Type | Description |
|---|---|
audio_stream_start |
Start audio session |
audio |
Audio chunk (base64 PCM16 24 kHz mono) |
text |
Text message to AI |
interrupt |
Stop AI response |
Server β Client:
| Type | Description |
|---|---|
config |
Audio settings (sent on connect) |
audio_start |
AI response started |
sync_frame |
Audio + 52 ARKit blendshapes (30 FPS) |
audio_end |
AI response finished |
transcript_delta |
Streaming text fragment |
transcript_done |
Complete turn transcript |
rich_content |
Markdown content for the chat view |
avatar_state |
"Listening" or "Responding" |
This project is licensed under the MIT License β see LICENSE for details.
Made with β₯ by Myned AI Β Β·Β nyxclaw.ai Β Β·Β Buy me a coffee
