A real-time voice-controlled camera system for smart glasses featuring computer vision, speech recognition, and gesture detection with multilingual support.
- Hands-free operation with natural language commands
- Multilingual support: English and Ukrainian with dynamic language switching
- Offline speech recognition using Vosk models
- Full Unicode/Cyrillic text rendering via PyQt5
- Voice-activated photo capture: "Take a picture"
- Video recording with audio: "Start recording" / "Stop recording"
- Automatic audio/video muxing using FFmpeg
- Organized file management (separate folders for photos/videos)
- Real-time face detection using OpenCV Haar Cascades or dlib
- Hand tracking with MediaPipe Hands
- Sign language recognition (11+ ASL gestures including thumbs up, peace, OK, etc.)
- 30+ FPS performance with optimized video processing
- Modern PyQt5 GUI with real-time video display
- Live transcription overlay with automatic text wrapping
- Status indicators: FPS counter, recording status, current language
- Responsive design with proper Unicode support for international text
- Python 3.11 or higher
- Webcam
- Microphone
- FFmpeg (optional, for audio/video muxing)
- Clone the repository
git clone https://github.com/yourusername/rayband-voice-camera.git
cd rayband-voice-camera- Create virtual environment
python -m venv venv311
venv311\Scripts\activate # Windows
# source venv311/bin/activate # Linux/Mac- Install dependencies
pip install -r requirements.txt- Download Vosk models
Download the speech recognition models and place them in the models/ directory:
Extract them to:
models/
├── vosk-model-en-us-0.22/
└── vosk-model-uk-v3/
- Run the application
python