O.D.I.N. - Object Detection-Interpretation Narrator — AI Video-To-Speech Algorithm for the Visually Impaired

O.D.I.N. captures your webcam, runs Ultralytics YOLO object detection, estimates rough distance from a single camera, infers motion and pose-related cues (e.g. possible waving, hands near a surface), and speaks a natural-language summary with gTTS + pygame. Optional Purdue GenAI Studio can polish the same structured context over an OpenAI-compatible HTTP API.

Narration is deduplicated: it only speaks again when the scene “fingerprint” changes (objects, position, distance bucket, motion/pose hints, or lighting)—not on a fixed repeat loop.

Features

Area	What it does
Detection	`yolo26n.pt` (auto-downloaded on first run). Configurable in `main.py` as `YOLO_MODEL`.
Depth (approximate)	Maps bounding-box size to spoken ranges like “about three feet away.” Not metric/LiDAR accuracy—tune constants in `guide_dog/depth.py` for your webcam if needed.
Motion	Frame differencing on a downscaled gray stream; upper vs lower ROI on person boxes for cues like possible arm movement / waving.
Pose	Optional `yolo11n-pose.pt` (throttled) for wrists/shoulders/hips; combines with table detections for “hands may be resting on a surface” style hints.
Speech	gTTS generates MP3; pygame plays it. Utterances are queued so rapid updates are not dropped.
Purdue GenAI	If `PURDUE_GENAI_API_KEY` is set, the app sends structured sensor lines + a local draft; the model returns polished TTS-friendly text. Without a key, only the local narrator runs.

Project layout

main.py — Camera loop, YOLO, TTS, optional GenAI, wiring.
odin/ — depth.py, motion.py, pose_hints.py, narration.py, narrative_builder.py.
tests/ — Unit tests (e.g. depth heuristics). Run with python -m unittest discover -s tests -v.

Setup

Python 3.10+ recommended (3.13 works on the maintained stack).
Install dependencies:
```
pip install -r requirements.txt
```
Weights (yolo26n.pt, yolo11n-pose.pt) download automatically when first used.
Optional — Purdue GenAI Studio

Create an API key in the GenAI Studio UI (avatar → Settings → Account → API Keys).

Either set an environment variable:
```
# PowerShell
$env:PURDUE_GENAI_API_KEY = "your-key"
```
Or add a .env file next to main.py (loaded via python-dotenv):
```
PURDUE_GENAI_API_KEY=your-key-here
```
Optional overrides:
- PURDUE_GENAI_URL — default https://genai.rcac.purdue.edu/api/chat/completions
- PURDUE_GENAI_MODEL — default llama3.1:latest (use a model your project exposes)
If the key is missing, Guide Dog still runs: camera + YOLO + local narration only.

Usage

python main.py

q — Quit
l — Change TTS language (BCP-47 code, e.g. en, es)

Adjust behavior at the top of main.py (e.g. DESCRIBE_EVERY_N_SEC, YOLO_CONFIDENCE, CAMERA_INDEX, POSE_EVERY_N_FRAMES).

Limitations (read this)

Distance is a heuristic from monocular video, not a depth sensor.
“Waving” / “hands on table” are best-effort combinations of motion, pose, and object labels—false positives happen in clutter or bad lighting.
gTTS requires network access to synthesize speech.

Dependencies

See requirements.txt — notably: ultralytics, opencv-python, numpy, scikit-image, requests, python-dotenv, gTTS, pygame.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
__pycache__		__pycache__
odin		odin
tests		tests
.DS_Store		.DS_Store
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
yolo11n-pose.pt		yolo11n-pose.pt
yolo26n.pt		yolo26n.pt
yolov8n.pt		yolov8n.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

O.D.I.N. - Object Detection-Interpretation Narrator — AI Video-To-Speech Algorithm for the Visually Impaired

Features

Project layout

Setup

Usage

Limitations (read this)

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

O.D.I.N. - Object Detection-Interpretation Narrator — AI Video-To-Speech Algorithm for the Visually Impaired

Features

Project layout

Setup

Usage

Limitations (read this)

Dependencies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages