- 🤖 Auto-loading: Default VRM model and animations on startup
- 🔄 Real-time Animation: Seamless idle ↔ talking state switching
- 💬 Mouth Animation: Volume-based blend shapes with fine-tuned controls
- 🎮 Interactive Controls: Full camera and avatar manipulation
- 📁 Multi-format Support: VRM, FBX, VRMA, GLB, GLTF files
- 🌐 WebSocket Communication: Zero-latency animation triggers
- ⚡ Perfect Timing: Server-side synchronization for precise animation
- 🎯 Smart Detection: Deterministic triggers, not audio guessing
- 🔧 Optional Mode: Works standalone without TTS server
- Movement:
WASD(avatar),Arrow Keys(camera),Mouse(orbit) - Manipulation:
Ctrl+WASD(rotation),Shift+Drag(positioning) - UI:
H(toggle interface), organized accordion panels - Animation: Manual play/stop/reset controls
You CANNOT open room.html directly in browser. Use a web server:
1. Install "Live Server" extension
2. Right-click room.html → "Open with Live Server"
3. Opens at http://127.0.0.1:5500/room.html# Python (built-in)
python -m http.server 8000
# → http://localhost:8000/room.html
# Node.js
npx serve .
# → http://localhost:3000/room.html
# PHP
php -S localhost:8000
# → http://localhost:8000/room.htmlWhy? ES6 modules, CORS restrictions, and WebSocket context require HTTP protocol.
- Start web server
- Open
room.html - Auto-loads:
- Default VRM avatar (
AvatarSample_H.vrm) - Idle animation (
Happy Idle.fbx) - Talking animation (
Talking.fbx)
- Default VRM avatar (
- Use manual controls to trigger animations
- Run TTS server:
run_gpt_sovits.bat - Start web server and open
room.html - Check "Enable TTS WebSocket Connection"
- Animations trigger automatically with TTS audio
VRMViewer/
├── room.html # 🎯 Main application
├── api_v3.py # 🔌 Modified TTS server
├── run_gpt_sovits.bat # 🚀 Server launcher
├── assets/
│ ├── models/ # 🤖 VRM avatar files
│ │ ├── AvatarSample_H.vrm
│ │ └── *.vrm
│ └── animations/ # 🎭 Animation files
│ ├── Happy Idle.fbx
│ ├── Talking.fbx
│ └── *.fbx, *.vrma
├── js/ # 📦 JavaScript modules
│ ├── three-vrm-core.module.js
│ ├── three-vrm-animation.module.js
│ └── loadMixamoAnimation.js
└── css/ # 🎨 Styling
└── styles.css
Our modified api_v3.py extends GPT-SoVITS with WebSocket animation signals:
# Real-time VRM communication
vrm_websocket = None
async def notify_vrm(message_type, text=None):
if vrm_websocket:
message = {"type": message_type}
await vrm_websocket.send(json.dumps(message))
# Perfect timing integration
await notify_vrm("tts_start") # Animation begins
await notify_vrm("tts_end") # Return to idle{"type": "tts_start"} // 🗣️ Start talking animation
{"type": "tts_end"} // 😴 Return to idle animation - Browser: Chrome/Firefox/Edge with WebGL support
- TTS Server: GPT-SoVITS v2 Pro (optional)
- Dependencies:
websockets,asyncio(for TTS integration)
// Browser console commands:
startIdleAnimation(); // 😴 Start idle
startTalkingAnimationFromTTS(); // 🗣️ Start talking
stopAnimation(); // ⏹️ Stop current
resetAnimation(); // 🔄 Reset to idle- VRM Models: Drop into
assets/models/folder - Animations: Support FBX (Mixamo) and VRMA formats
- Environments: GLB/GLTF room files supported
- Auto-retargeting: Mixamo animations automatically fit VRM skeleton
- Mouth Gain: Adjust lip-sync intensity (0.1 - 2.0)
- Body Threshold: Set talking animation trigger sensitivity
- Blend Shapes: Utilizes VRM visemes (aa, ih, ou, ee, oh)
- Port: 8765 (configurable in
api_v3.py) - Auto-reconnect: 5-second intervals on connection loss
- Status Indicators: Real-time connection status display
- OBS Compatible: Optimized for broadcast software
- Performance: Hardware acceleration recommended
- Audio Routing: Support for virtual audio cables
| Problem | Solution |
|---|---|
| 🚫 Modules not loading | Use HTTP server, not file:// protocol |
| 🔇 Audio not working | Check browser permissions & device selection |
| 👻 VRM not visible | Verify valid VRM file, check console errors |
| 🎭 Animations not playing | Confirm FBX/VRMA format, check VRM compatibility |
| 🔌 TTS connection failed | Verify api_v3.py server running on port 8765 |
- ✅ Enable hardware acceleration in browser settings
- ✅ Use Chrome/Edge for best WebGL performance
- ✅ Disable unused features (spring bones, etc.) if lag occurs
- ✅ Local server recommended over network drives
- ✅ Close unused browser tabs for optimal performance
- Three.js r169: 3D rendering engine
- VRM 3.0: Avatar standard support
- WebSocket: Real-time communication
- ES6 Modules: Clean import system
- Web Audio API: Advanced audio processing
- Deterministic Timing: TTS server knows exact audio timing
- Zero Latency: No audio detection delays
- Universal Compatibility: Works regardless of user audio setup
- Reliable Synchronization: No false positives or missed triggers
Built with:
- 🎯 Three.js + VRM Libraries
- 🤖 VRM Consortium sample assets
- 🎭 Mixamo animation library
- 🗣️ GPT-SoVITS TTS framework

