Skip to content

k4urman/odin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

O.D.I.N. - Object Detection-Interpretation Narrator — AI Video-To-Speech Algorithm for the Visually Impaired

O.D.I.N. captures your webcam, runs Ultralytics YOLO object detection, estimates rough distance from a single camera, infers motion and pose-related cues (e.g. possible waving, hands near a surface), and speaks a natural-language summary with gTTS + pygame. Optional Purdue GenAI Studio can polish the same structured context over an OpenAI-compatible HTTP API.

Narration is deduplicated: it only speaks again when the scene “fingerprint” changes (objects, position, distance bucket, motion/pose hints, or lighting)—not on a fixed repeat loop.

Features

Area What it does
Detection yolo26n.pt (auto-downloaded on first run). Configurable in main.py as YOLO_MODEL.
Depth (approximate) Maps bounding-box size to spoken ranges like “about three feet away.” Not metric/LiDAR accuracy—tune constants in guide_dog/depth.py for your webcam if needed.
Motion Frame differencing on a downscaled gray stream; upper vs lower ROI on person boxes for cues like possible arm movement / waving.
Pose Optional yolo11n-pose.pt (throttled) for wrists/shoulders/hips; combines with table detections for “hands may be resting on a surface” style hints.
Speech gTTS generates MP3; pygame plays it. Utterances are queued so rapid updates are not dropped.
Purdue GenAI If PURDUE_GENAI_API_KEY is set, the app sends structured sensor lines + a local draft; the model returns polished TTS-friendly text. Without a key, only the local narrator runs.

Project layout

  • main.py — Camera loop, YOLO, TTS, optional GenAI, wiring.
  • odin/depth.py, motion.py, pose_hints.py, narration.py, narrative_builder.py.
  • tests/ — Unit tests (e.g. depth heuristics). Run with python -m unittest discover -s tests -v.

Setup

  1. Python 3.10+ recommended (3.13 works on the maintained stack).

  2. Install dependencies:

    pip install -r requirements.txt

    Weights (yolo26n.pt, yolo11n-pose.pt) download automatically when first used.

  3. Optional — Purdue GenAI Studio

    Create an API key in the GenAI Studio UI (avatar → SettingsAccountAPI Keys).

    Either set an environment variable:

    # PowerShell
    $env:PURDUE_GENAI_API_KEY = "your-key"

    Or add a .env file next to main.py (loaded via python-dotenv):

    PURDUE_GENAI_API_KEY=your-key-here
    

    Optional overrides:

    • PURDUE_GENAI_URL — default https://genai.rcac.purdue.edu/api/chat/completions
    • PURDUE_GENAI_MODEL — default llama3.1:latest (use a model your project exposes)

    If the key is missing, Guide Dog still runs: camera + YOLO + local narration only.

Usage

python main.py
  • q — Quit
  • l — Change TTS language (BCP-47 code, e.g. en, es)

Adjust behavior at the top of main.py (e.g. DESCRIBE_EVERY_N_SEC, YOLO_CONFIDENCE, CAMERA_INDEX, POSE_EVERY_N_FRAMES).

Limitations (read this)

  • Distance is a heuristic from monocular video, not a depth sensor.
  • “Waving” / “hands on table” are best-effort combinations of motion, pose, and object labels—false positives happen in clutter or bad lighting.
  • gTTS requires network access to synthesize speech.

Dependencies

See requirements.txt — notably: ultralytics, opencv-python, numpy, scikit-image, requests, python-dotenv, gTTS, pygame.

About

Video-to-Speech algorithm using Python, OpenCV, Purdue GenAI, and Pyttsx3 for Visually Impaired Indivdiuals.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages