Real-time speech captions that follow your face. A Python app using your webcam, face tracking, and streaming speech-to-text—with a Minecraft-style look and optional emotion-based styling.
- Features
- Quick Start
- Setup
- Run
- Keyboard Shortcuts
- OBS / Streaming
- Configuration
- Project Structure
- Troubleshooting
| Feature | Description |
|---|---|
| Face-following | Captions stay above your head and move with you. |
| Real-time STT | Streaming captions (Deepgram, Vosk, or faster-whisper) or fast batch (Google). |
| Minecraft style | Chat-style font and box; place Minecraft.ttf in the project or fonts/ folder. |
| Emotion colors | With MediaPipe Face Mesh, caption tint reflects expression (happy, sad, etc.). |
| Face tracking | MediaPipe Face Landmarker when available; otherwise OpenCV Haar cascade. |
| OBS-ready | --obs-mode for green-screen overlay; use Window Capture + Chroma Key in OBS. |
pip install -r requirements.txt
python face_captions.py- Q = Quit. + / - = Caption size. See Keyboard Shortcuts for more.
For streaming captions, run once: python download_vosk_model.py (or set DEEPGRAM_API_KEY in .env for cloud STT).
For face mesh + emotion, run once: python download_face_landmarker_model.py.
- Python 3.8+
- Microphone access for speech-to-text
pip install -r requirements.txtOn Windows, if PyAudio fails:
pip install pipwin
pipwin install pyaudio| Option | Command / Config | Notes |
|---|---|---|
| Deepgram (cloud) | Add DEEPGRAM_API_KEY to .env |
Best quality, low latency; needs API key. |
| Vosk (offline) | python download_vosk_model.py |
No API key; use --large for better accuracy. |
| faster-whisper (offline) | pip install faster-whisper |
Good accuracy, no extra download script. |
| Google (online) | (default if none of the above) | Batch mode; no setup. |
python download_face_landmarker_model.pyPuts the model in models/. Without it, the app uses OpenCV’s face detector and neutral emotion.
Place Minecraft.ttf in the project folder or in fonts/. Otherwise a system font is used.
Set CAMERA_INDEX in face_captions.py (e.g. 1), or use list_mics.py-style helpers if you add a camera list script. On Windows you can also try different indices (0, 1, 2).
python face_captions.pyOBS mode (green screen for Window Capture + Chroma Key):
python face_captions.py --obs-mode
python face_captions.py --obs-mode --window-size 800x600
python face_captions.py --obs-mode --chroma-color blueOBS window position and caption offsets are saved to obs_window.json on quit and restored on next run.
Performance overlay (FPS, face, STT on frame): --show-perf or press D for debug.
WebSocket control API (e.g. Stream Deck): pip install websockets then python face_captions.py --enable-api (default port 8765). Send JSON: {"action": "toggle_speech_bubble"}, {"action": "set_caption_offset", "x": 0, "y": -10}.
See OBS / Streaming and OBS_SETUP.md for full instructions.
| Key | Action |
|---|---|
| Q | Quit |
| B | Toggle face bounding box |
| T | Toggle speech bubble tail |
| H | Toggle caption history (1 vs 2 lines) |
| F | Toggle fade-in |
| C | Toggle color filter (hot map) |
| M | Toggle translation (if googletrans installed) |
| + / = | Increase caption size |
| - | Decrease caption size |
| 0 | Reset caption size to 100% |
| 1 | Minimum caption size |
| 9 | Maximum caption size |
| D | Toggle debug (FPS, cache, frame time; in OBS mode shows grid) |
Use the app as a caption overlay in OBS:
- Run:
python face_captions.py --obs-mode - In OBS: Add source → Window Capture → select the “Face captions” window
- Right-click source → Filters → Chroma Key → set color to green (or match
--chroma-color)
The window is always-on-top and shows only the captions on a solid background so you can chroma-key it. Your camera is used only by the app; capture the app window in OBS, not the camera itself.
Full guide: OBS_SETUP.md
Main settings are at the top of face_captions.py:
CAMERA_INDEX– Webcam device indexCAPTION_FONT_SIZE,CAPTION_MAX_WIDTH– Text size and widthCAPTION_SCALE_MIN,CAPTION_SCALE_MAX– Size limits for +/-CAPTION_TIMEOUT_SEC– How long captions stay on screenDISPLAY_SIZE– Internal display size (e.g. 1280×720)
Optional .env (in project folder):
DEEPGRAM_API_KEY– For Deepgram streaming STTDEEPGRAM_INPUT_DEVICE_INDEX– Microphone index for DeepgramDEBUG_STT– Extra STT logs
├── face_captions.py # Main app (webcam, face tracking, captions)
├── face_mesh.py # MediaPipe face landmarker + emotion
├── realtime_stt.py # Streaming STT (Deepgram, Vosk, faster-whisper, Google)
├── download_vosk_model.py # One-time Vosk model download
├── download_face_landmarker_model.py # One-time face model download
├── list_mics.py # List microphone devices
├── OBS_SETUP.md # OBS setup guide
├── requirements.txt
├── .env # Optional: DEEPGRAM_API_KEY, etc.
├── models/ # Face (and optional) models
└── Minecraft.ttf # Optional font
| Issue | What to try |
|---|---|
| No speech / wrong mic | Check OS mic permissions; set DEEPGRAM_INPUT_DEVICE_INDEX in .env; run list_mics.py to see device indices. |
| “Caption mode: fast batch” | Install Vosk and run download_vosk_model.py, or set DEEPGRAM_API_KEY in .env for streaming. |
| “Face: OpenCV Haar cascade” | Run download_face_landmarker_model.py and ensure the model is in models/. |
| Camera not opening | Change CAMERA_INDEX (0, 1, 2…); on Windows try closing other apps using the camera. |
| Camera works here but not in OBS | Don’t add the camera in OBS. Use Window Capture on the face_captions window so only the app uses the camera. |
| Low FPS | Press D for debug; close other apps; in OBS mode use --window-size 800x600. |
Use and modify as you like. If you redistribute, keep attribution.