You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A React app that reads words aloud from a camera feed, controlled by hand gestures from legally blind users.
Based on MediaPipe, OCR, and custom gesture detection, this tool empowers visually impaired users to access visual text in their environment.
✨ Features
Real-time camera capture and processing
Hand gesture recognition (e.g. “point left”, “O”, open palm)
OCR (text recognition) on camera frames
Text-to-speech output to read recognized words aloud
Lightweight fallback and buffering to avoid flicker errors
🏗 Architecture & Components
Component
Responsibility
Frontend (React / Next.js / “use client”)
Captures video, draws landmarks, sends gestures
Gesture Recognizer (MediaPipe Tasks–Vision)
Detects hand landmarks & base gesture categories
Custom Gesture Overrides
Rules-based detection for “O”, “point left”, etc.
Stable Gesture Buffering
Avoids flicker by requiring consistent predictions
Keypress Simulation
Emits synthetic key events mapped to gestures
Backend / OCR / TTS (Flask or similar)
Processes camera frames, runs OCR, reads text aloud
🛠️ Setup & Run
Clone the repo
git clone https://github.com/groffbo/sight-to-speech.git
cd sight-to-speech
Clone the repo
python3 -m venv venv
source venv/bin/activate # On macOS/Linux
venv\Scripts\activate # On Windows PowerShell