This repository contains the software side of a gesture-recognition pipeline that targets a custom SoC on the Arty S7-25 FPGA. A webcam on my Macbook Pro collects hand landmarks with MediaPipe, a lightweight MLP classifies gestures, and the quantized model is exported so the FPGA can drive a BiStable robot via an ESP32 WiFi link. The long-term goal is a secure, low-latency loop where the FPGA validates a four-gesture passcode through an enclave block, updates a host-facing display, and streams motion commands to the robot.
- Capture –
gesture-pipelines/gesture-webcam.pystreams frames, extracts 21-point landmarks, and soft-validates predictions with the quantized weights. - Train –
training/mlp-training.pyfits the MLP on curated datasets (data/landmarks_filtered.csv) to refreshmodels/gesture_mlp_model.h5. - Quantize/Export –
scripts/quantizing.pyconverts per-layer CSV weights intomodels/quantized_weights.binfor FPGA consumption. - Deploy – FPGA logic consumes the binary weights, runs inference, evaluates the gesture passcode in a secure enclave, and relays unlock + control signals to the robot through the ESP32 peripheral.
data/– Landmark CSV datasets (landmarks.csv,landmarks_filtered.csv).data-modifications/– CSV utilities for combining, pruning, and labeling landmark data.training/– Model training and evaluation scripts (mlp-training.py,gesture-logger.py).gesture-pipelines/– MediaPipe-based webcam and still-image prototypes.models/– Trained assets (gesture_mlp_model.h5,gesture_recognizer.task,quantized_weights.bin,layer_*_{weights,bias}.csv).scripts/– Utility scripts (quantizing.py).images/– Sample gesture reference images.
- Use Python 3.10+ and create a virtual environment (
python -m venv gesture-env). - Activate the environment (
source gesture-env/bin/activateon macOS/Linux). - Install dependencies:
pip install -r requirements.txt # or install mediapipe, tensorflow, scikit-learn, opencv-python, matplotlib, pandas, numpy - Plug in a webcam and verify MediaPipe access before running the pipelines.
Run the MLP training loop from the repository root:
python training/mlp-training.pyThe script stratifies the dataset, computes class weights, and reports validation accuracy. It also writes models/scaler_params.npz, which captures the normalization statistics used during training—keep this file with the exported weights so inference on the FPGA or host matches your preprocessing. Keep accuracy above 0.90 to maintain reliable unlock sequences; adjust preprocessing or class weights if performance drops. Use training/gesture-logger.py to collect new samples and extend the dataset when onboarding new gestures.
After training, regenerate the FPGA-ready weights:
python scripts/quantizing.pyThis script expects the latest layer_* CSV exports in the models/ directory and rewrites models/quantized_weights.bin. Consume this binary inside your HDL/SoC project to initialize BRAM or ROM blocks. Track the checksum or git hash of each binary when flashing the Arty S7-25 to keep the hardware configuration auditable.
Use the webcam prototype to verify predictions end-to-end:
python gesture-pipelines/gesture-webcam.pyThe script loads models/gesture_recognizer.task, applies the saved scaler parameters, feeds frames through MediaPipe, and evaluates the quantized weights in Python. Confirm latency and class stability here before synthesizing FPGA builds. When experimenting with new display peripherals, mirror FPGA output in the console to speed up debugging.
- Reserve BRAM for the three dense layers and plan a streaming interface for 21 landmark triplets produced by the host.
- The unlock flow requires buffering four gestures; implement a state machine that mirrors the secure enclave logic.
- Use the ESP32 WROOM module as a WiFi co-processor—define a narrow command protocol (
FORWARD,LEFT, etc.) and expose diagnostics on UART for bring-up. - Plan to surface FPGA status (locked/unlocked, last gesture, radio link health) back to the Mac for operator visibility.
Short-term objectives include documenting the passcode enclave interface, scripting an automated export flow (train → quantize → package), and measuring end-to-end latency. Mid-term, add HDL testbenches for the MLP core and integrate display peripherals. Long-term, secure the communication path (host ↔ FPGA ↔ ESP32) and finalize robot control behaviors.