Face-Following Captions

Real-time speech captions that follow your face. A Python app using your webcam, face tracking, and streaming speech-to-text—with a Minecraft-style look and optional emotion-based styling.

Features

Feature	Description
Face-following	Captions stay above your head and move with you.
Real-time STT	Streaming captions (Deepgram, Vosk, or faster-whisper) or fast batch (Google).
Minecraft style	Chat-style font and box; place `Minecraft.ttf` in the project or `fonts/` folder.
Emotion colors	With MediaPipe Face Mesh, caption tint reflects expression (happy, sad, etc.).
Face tracking	MediaPipe Face Landmarker when available; otherwise OpenCV Haar cascade.
OBS-ready	`--obs-mode` for green-screen overlay; use Window Capture + Chroma Key in OBS.

Quick Start

pip install -r requirements.txt
python face_captions.py

Q = Quit. + / - = Caption size. See Keyboard Shortcuts for more.

For streaming captions, run once: python download_vosk_model.py (or set DEEPGRAM_API_KEY in .env for cloud STT).
For face mesh + emotion, run once: python download_face_landmarker_model.py.

Setup

Requirements

Python 3.8+
Microphone access for speech-to-text

Install dependencies

pip install -r requirements.txt

On Windows, if PyAudio fails:

pip install pipwin
pipwin install pyaudio

Speech-to-text (pick one)

Option	Command / Config	Notes
Deepgram (cloud)	Add `DEEPGRAM_API_KEY` to `.env`	Best quality, low latency; needs API key.
Vosk (offline)	`python download_vosk_model.py`	No API key; use `--large` for better accuracy.
faster-whisper (offline)	`pip install faster-whisper`	Good accuracy, no extra download script.
Google (online)	(default if none of the above)	Batch mode; no setup.

Face tracking + emotion (optional)

python download_face_landmarker_model.py

Puts the model in models/. Without it, the app uses OpenCV’s face detector and neutral emotion.

Minecraft font (optional)

Place Minecraft.ttf in the project folder or in fonts/. Otherwise a system font is used.

Different camera

Set CAMERA_INDEX in face_captions.py (e.g. 1), or use list_mics.py-style helpers if you add a camera list script. On Windows you can also try different indices (0, 1, 2).

Run

python face_captions.py

OBS mode (green screen for Window Capture + Chroma Key):

python face_captions.py --obs-mode
python face_captions.py --obs-mode --window-size 800x600
python face_captions.py --obs-mode --chroma-color blue

OBS window position and caption offsets are saved to obs_window.json on quit and restored on next run.

Performance overlay (FPS, face, STT on frame): --show-perf or press D for debug.

WebSocket control API (e.g. Stream Deck): pip install websockets then python face_captions.py --enable-api (default port 8765). Send JSON: {"action": "toggle_speech_bubble"}, {"action": "set_caption_offset", "x": 0, "y": -10}.

See OBS / Streaming and OBS_SETUP.md for full instructions.

Keyboard Shortcuts

Key	Action
Q	Quit
B	Toggle face bounding box
T	Toggle speech bubble tail
H	Toggle caption history (1 vs 2 lines)
F	Toggle fade-in
C	Toggle color filter (hot map)
M	Toggle translation (if googletrans installed)
+ / =	Increase caption size
-	Decrease caption size
0	Reset caption size to 100%
1	Minimum caption size
9	Maximum caption size
D	Toggle debug (FPS, cache, frame time; in OBS mode shows grid)

OBS / Streaming

Use the app as a caption overlay in OBS:

Run: python face_captions.py --obs-mode
In OBS: Add source → Window Capture → select the “Face captions” window
Right-click source → Filters → Chroma Key → set color to green (or match --chroma-color)

The window is always-on-top and shows only the captions on a solid background so you can chroma-key it. Your camera is used only by the app; capture the app window in OBS, not the camera itself.

Full guide: OBS_SETUP.md

Configuration

Main settings are at the top of face_captions.py:

CAMERA_INDEX – Webcam device index
CAPTION_FONT_SIZE, CAPTION_MAX_WIDTH – Text size and width
CAPTION_SCALE_MIN, CAPTION_SCALE_MAX – Size limits for +/-
CAPTION_TIMEOUT_SEC – How long captions stay on screen
DISPLAY_SIZE – Internal display size (e.g. 1280×720)

Optional .env (in project folder):

DEEPGRAM_API_KEY – For Deepgram streaming STT
DEEPGRAM_INPUT_DEVICE_INDEX – Microphone index for Deepgram
DEBUG_STT – Extra STT logs

Project Structure

├── face_captions.py       # Main app (webcam, face tracking, captions)
├── face_mesh.py           # MediaPipe face landmarker + emotion
├── realtime_stt.py        # Streaming STT (Deepgram, Vosk, faster-whisper, Google)
├── download_vosk_model.py # One-time Vosk model download
├── download_face_landmarker_model.py  # One-time face model download
├── list_mics.py           # List microphone devices
├── OBS_SETUP.md           # OBS setup guide
├── requirements.txt
├── .env                   # Optional: DEEPGRAM_API_KEY, etc.
├── models/                # Face (and optional) models
└── Minecraft.ttf          # Optional font

Troubleshooting

Issue	What to try
No speech / wrong mic	Check OS mic permissions; set `DEEPGRAM_INPUT_DEVICE_INDEX` in `.env`; run `list_mics.py` to see device indices.
“Caption mode: fast batch”	Install Vosk and run `download_vosk_model.py`, or set `DEEPGRAM_API_KEY` in `.env` for streaming.
“Face: OpenCV Haar cascade”	Run `download_face_landmarker_model.py` and ensure the model is in `models/`.
Camera not opening	Change `CAMERA_INDEX` (0, 1, 2…); on Windows try closing other apps using the camera.
Camera works here but not in OBS	Don’t add the camera in OBS. Use Window Capture on the face_captions window so only the app uses the camera.
Low FPS	Press D for debug; close other apps; in OBS mode use `--window-size 800x600`.

License

Use and modify as you like. If you redistribute, keep attribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Face-Following Captions

Table of Contents

Features

Quick Start

Setup

Requirements

Install dependencies

Speech-to-text (pick one)

Face tracking + emotion (optional)

Minecraft font (optional)

Different camera

Run

Keyboard Shortcuts

OBS / Streaming

Configuration

Project Structure

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
__pycache__		__pycache__
models/vosk-model-small-en-us-0.15		models/vosk-model-small-en-us-0.15
.cursorignore		.cursorignore
.gitignore		.gitignore
Minecraft.ttf		Minecraft.ttf
OBS_SETUP.md		OBS_SETUP.md
README.md		README.md
config.json		config.json
download_face_landmarker_model.py		download_face_landmarker_model.py
download_vosk_model.py		download_vosk_model.py
face_captions.py		face_captions.py
face_mesh.py		face_mesh.py
list_mics.py		list_mics.py
realtime_stt.py		realtime_stt.py
requirements.txt		requirements.txt

Phantasm0009/face-caption

Folders and files

Latest commit

History

Repository files navigation

Face-Following Captions

Table of Contents

Features

Quick Start

Setup

Requirements

Install dependencies

Speech-to-text (pick one)

Face tracking + emotion (optional)

Minecraft font (optional)

Different camera

Run

Keyboard Shortcuts

OBS / Streaming

Configuration

Project Structure

Troubleshooting

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages