O.D.I.N. - Object Detection-Interpretation Narrator — AI Video-To-Speech Algorithm for the Visually Impaired
O.D.I.N. captures your webcam, runs Ultralytics YOLO object detection, estimates rough distance from a single camera, infers motion and pose-related cues (e.g. possible waving, hands near a surface), and speaks a natural-language summary with gTTS + pygame. Optional Purdue GenAI Studio can polish the same structured context over an OpenAI-compatible HTTP API.
Narration is deduplicated: it only speaks again when the scene “fingerprint” changes (objects, position, distance bucket, motion/pose hints, or lighting)—not on a fixed repeat loop.
| Area | What it does |
|---|---|
| Detection | yolo26n.pt (auto-downloaded on first run). Configurable in main.py as YOLO_MODEL. |
| Depth (approximate) | Maps bounding-box size to spoken ranges like “about three feet away.” Not metric/LiDAR accuracy—tune constants in guide_dog/depth.py for your webcam if needed. |
| Motion | Frame differencing on a downscaled gray stream; upper vs lower ROI on person boxes for cues like possible arm movement / waving. |
| Pose | Optional yolo11n-pose.pt (throttled) for wrists/shoulders/hips; combines with table detections for “hands may be resting on a surface” style hints. |
| Speech | gTTS generates MP3; pygame plays it. Utterances are queued so rapid updates are not dropped. |
| Purdue GenAI | If PURDUE_GENAI_API_KEY is set, the app sends structured sensor lines + a local draft; the model returns polished TTS-friendly text. Without a key, only the local narrator runs. |
main.py— Camera loop, YOLO, TTS, optional GenAI, wiring.odin/—depth.py,motion.py,pose_hints.py,narration.py,narrative_builder.py.tests/— Unit tests (e.g. depth heuristics). Run withpython -m unittest discover -s tests -v.
-
Python 3.10+ recommended (3.13 works on the maintained stack).
-
Install dependencies:
pip install -r requirements.txt
Weights (
yolo26n.pt,yolo11n-pose.pt) download automatically when first used. -
Optional — Purdue GenAI Studio
Create an API key in the GenAI Studio UI (avatar → Settings → Account → API Keys).
Either set an environment variable:
# PowerShell $env:PURDUE_GENAI_API_KEY = "your-key"
Or add a
.envfile next tomain.py(loaded viapython-dotenv):PURDUE_GENAI_API_KEY=your-key-hereOptional overrides:
PURDUE_GENAI_URL— defaulthttps://genai.rcac.purdue.edu/api/chat/completionsPURDUE_GENAI_MODEL— defaultllama3.1:latest(use a model your project exposes)
If the key is missing, Guide Dog still runs: camera + YOLO + local narration only.
python main.py- q — Quit
- l — Change TTS language (BCP-47 code, e.g.
en,es)
Adjust behavior at the top of main.py (e.g. DESCRIBE_EVERY_N_SEC, YOLO_CONFIDENCE, CAMERA_INDEX, POSE_EVERY_N_FRAMES).
- Distance is a heuristic from monocular video, not a depth sensor.
- “Waving” / “hands on table” are best-effort combinations of motion, pose, and object labels—false positives happen in clutter or bad lighting.
- gTTS requires network access to synthesize speech.
See requirements.txt — notably: ultralytics, opencv-python, numpy, scikit-image, requests, python-dotenv, gTTS, pygame.