Skip to content

myned-ai/nyxclaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NyxClaw

Every claw deserves a face. 🦞

Give any AI agent a real-time face and voice.
Open source. Runs on your machine. No GPU required.

MIT License Website Hugging Face model App Store Google Play Buy me a coffee X / Twitter

NyxClaw avatar demo


What It Does

NyxClaw runs locally and turns any Claw agent into a talking, listening, lip-syncing avatar on your phone. Audio in β†’ 52 ARKit blendshapes out @ 30 FPS, all on CPU.

  • Real-time animation β€” Wav2Arkit ONNX, 52 ARKit blendshapes @ 30 FPS, CPU-only
  • Tool-call fillers β€” the avatar talks while the AI works, no awkward silence
  • Thinking silence β€” keeps breathing/blinking during processing gaps (up to 5 s)
  • Synced transcripts β€” text lands with the audio via delivery-clock tracking
  • Barge-in β€” interrupt mid-sentence; LLM + TTS + playback cancel in ~128 ms
  • Rich content β€” {speech, content} splits what the avatar says vs. what the app shows

Your server. Your data.

NyxClaw runs on your machine, right alongside your claw. End-to-end encrypted, cryptographically paired. No cloud. No relay. No telemetry.

πŸ” Ed25519 Auth β€” device pairing via cryptographic challenge. No passwords on the wire. πŸ”’ End-to-end WSS β€” auto-provisioned Cloudflare Tunnel. No port forwarding, no certs.
πŸ“± QR Pairing β€” scan to connect. One device at a time. Treat the code like a password. 🏠 Self-hosted β€” Docker or install script. Your machine, your data, your rules.

Two voice pipelines

Two reference pipelines out of the box β€” OpenAI Realtime for cloud-grade voice quality, Local CPU for total privacy. Swap or extend either one.

🌐 OpenAI Voice πŸ–₯️ Local Voice
STT OpenAI Realtime API faster-whisper + Silero VAD
TTS OpenAI TTS API Piper VITS ONNX
Install uv sync uv sync --extra local_voice
Footprint ~1 GB RAM, 1 core ~2 GB RAM, 2 cores
Privacy OpenAI sees the audio Nothing leaves your machine

Both pipelines run Wav2Arkit ONNX on CPU β€” 52 ARKit blendshapes at 30 FPS. Β Β πŸ€— Model card

Quick Start

Docker (recommended)

git clone https://github.com/myned-ai/nyxclaw.git
cd nyxclaw
cp .env.example .env
# Edit .env with your backend settings (BASE_URL, AUTH_TOKEN β€” see Backend Setup below)

docker compose up --build -d

On first boot, NyxClaw downloads models, provisions a Cloudflare Tunnel, and starts serving. Check the logs for your secure URL:

docker compose logs -f nyxclaw
# Tunnel: wss://a3f7b2c1.nyxclaw.ai/ws

Your mobile app connects to that wss:// URL β€” no port forwarding or TLS certs needed.

To enable local voice (Piper TTS + faster-whisper), set INSTALL_LOCAL_VOICE=true in .env before building.

Install script (Linux / macOS / Windows)

Installs NyxClaw + Cloudflare Tunnel as system services. Handles uv, cloudflared, model downloads, tunnel provisioning, and service registration (systemd / launchd / Windows service) automatically.

# Linux / macOS
./install.sh

# Windows (PowerShell as Administrator)
.\install.ps1

Backend Setup

NyxClaw supports two Claw backends. Set AGENT_TYPE in .env to switch.

OpenClaw

Requires the nyxclaw avatar patch applied to OpenClaw. See claw_patches/openclaw/README.md for full setup (patching, auth, AGENTS.md prompt).

AGENT_TYPE=openclaw
BASE_URL=http://127.0.0.1:18789
AUTH_TOKEN=your-openclaw-gateway-token
USE_AVATAR_ENDPOINT=true

ZeroClaw

Requires the nyxclaw avatar patch applied to ZeroClaw. See claw_patches/zeroclaw/README.md for full setup (patching, auth, AGENTS.md prompt).

AGENT_TYPE=zeroclaw
BASE_URL=http://127.0.0.1:42617
AUTH_TOKEN=zc_YOUR_TOKEN_HERE
USE_AVATAR_ENDPOINT=true

Unpatched backends

Both backends work without the avatar patch β€” set USE_AVATAR_ENDPOINT=false (the default). NyxClaw will use the standard /v1/chat/completions (OpenClaw) or /ws/chat (ZeroClaw) endpoints. Rich content (rich_content messages) won't be available β€” all LLM output is treated as speech.

Supported claws

Claw Notes
OpenClaw HTTP SSE backend with /v1/chat/completions/avatar
ZeroClaw WebSocket backend with /ws/avatar
Your claw next? Open an issue or email us

Configuration

All settings are configured via environment variables or .env file. See .env.example for the full template.

One session at a time. NyxClaw serves a single active connection β€” one avatar, one audio stream. You can pair multiple devices (phone, tablet, desktop) for convenience, but only one connects at a time. Treat the setup code like a password β€” anyone with it can pair a device and talk to your AI agent.

Claw patches

Backend-specific patches that add the avatar endpoint with structured {speech, content} output and tool call events. When the LLM's response includes content better seen than heard (URLs, tables, structured data), the patch splits the response:

  • speech β†’ avatar speaks a short phrase ("Here's the Wikipedia page, take a look.")
  • content β†’ forwarded as a rich_content message (markdown) to the client
Patch Backend Endpoint Docs
claw_patches/openclaw/ OpenClaw v2026.5.6 /v1/chat/completions/avatar (HTTP SSE) README
claw_patches/zeroclaw/ ZeroClaw v0.7.4 /ws/avatar (WebSocket) README

Without the patch, all LLM output is treated as speech β€” no rich_content messages.

Bring your own tunnel

NyxClaw auto-provisions a free Cloudflare Tunnel on first boot (wss://<id>.nyxclaw.ai). This service has limited capacity. You can use any reverse proxy or tunneling solution instead β€” NyxClaw just needs something that terminates TLS and forwards traffic to localhost:8080:

  • Cloudflare Tunnel (your own account) β€” run cloudflared tunnel with your own token
  • Tailscale β€” encrypted mesh VPN, stable DNS, zero config
  • nginx / Caddy β€” traditional reverse proxy with Let's Encrypt
  • ngrok β€” quick dev tunnels

Set AUTH_SETUP_CODE_URL=wss://your-domain/ws in .env so the QR code contains your custom URL.

Resource requirements

Component Memory Mode
Python + FastAPI + ONNX Runtime ~500 MB Both
Wav2Arkit (blendshape inference) ~200 MB Both
faster-whisper small.en (speech recognition) ~500 MB Local only
Piper TTS VITS (speech synthesis) ~100 MB Local only
Silero VAD (voice activity detection) ~10 MB Local only

OpenAI Voice: 1 GB RAM, 1 core minimum. Recommended 1.5 GB, 2 cores. Local Voice: 2 GB RAM, 2 cores minimum. Recommended 3–4 GB, 4 cores (STT, TTS, and blendshapes run concurrently during barge-in).

WebSocket protocol (/ws)

Client β†’ Server:

Type Description
audio_stream_start Start audio session
audio Audio chunk (base64 PCM16 24 kHz mono)
text Text message to AI
interrupt Stop AI response

Server β†’ Client:

Type Description
config Audio settings (sent on connect)
audio_start AI response started
sync_frame Audio + 52 ARKit blendshapes (30 FPS)
audio_end AI response finished
transcript_delta Streaming text fragment
transcript_done Complete turn transcript
rich_content Markdown content for the chat view
avatar_state "Listening" or "Responding"

License

This project is licensed under the MIT License β€” see LICENSE for details.


Made with β™₯ by Myned AI Β Β·Β  nyxclaw.ai Β Β·Β  Buy me a coffee

About

Real-time voice-to-avatar server with local STT/TTS pipeline for any Claw!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors