This project demonstrates a method to estimate real-world object dimensions from a first-person camera perspective using ray-plane intersections and camera modeling. It features:
- Utility functions for estimating real-world dimensions from screen-space coordinates
- A visual proof of concept for intuitive understanding
- A command-line interface for estimating object sizes from user input
- Unit tests to verify the implementation
The core folder contains all the logic related to the geometric size estimation,
while the rest of the project is dedicated to providing utility/test scripts to use it.
Most user should only use the cli folder to estimate object sizes from the command line
or the fps_demo folder to visualize and interact in an fps-like environment (instructions below).
Advanced user might create custom scripts that leverage functions from the core folder.
Run with:
python -m fps_demo.viewerControls:
QZSD— Move the box (left / up / down / right)WXCV— Change the box sizeRF— Change the camera heightTGYH— Change the camera field of view
(T/Gfor horizontal FOV,Y/Hfor vertical FOV)- Mouse — Control camera yaw and pitch
Example usage:
python -m cli.estimate --image-width 768 --image-height 432 --bbox-x 300 --bbox-y 100 --bbox-width 150 --bbox-height 100 --focal-length-mm 4.6 --sensor-width-mm 5.6 --sensor-height-mm 3.2 --camera-height 2.12 --yaw 0.0 --pitch -25.5Result:
Estimated physical size of object:
Width = 1.493 m
Height = 1.009 m
Assumptions:
- Camera position: [0. 2.12 0. ]
- Yaw: 0.00 deg, Pitch: -25.50 deg
- Resolution: 768 x 432
- Focal length: 4.6 mm
- Sensor size: 5.6 mm x 3.2 mm
- Box coordinates: (300.0, 100.0), Width: 150.0, Height: 100.0
- Box spanning from (300.0, 100.0) to (450.0, 200.0)
- Camera FOV: 60.0 deg horizontal, 33.75 deg vertical
Estimates the size of an object given user input (command-line arguments).
fps_demo/viewer.py: Manual visual test to verify whether the results make intuitive sense.tests/test_plane.py: Verify the Plane class.tests/test_camera.py: Verify the Camera class.tests/test_intersection.py: Verify the intersection logic.tests/test_largest_diameters.py: Verify the largest diameters computations.tests/test_against_YOLO.py: Compare the estimated sizes against the data/yolo_groundtruth_data.csv dataset and verify the results do not diverge too much.
test ideas (TODO):
- quantify the error if we add lens distortion
- Assumes a flat ground plane at
y = 0. - Uses a simplified pinhole camera model (no lens distortion).
- The object size is estimated by adding the distance between the intersection of the box's center ray with the ground plane and each 4 plane defined by the bounding box corners and the camera's parameters. The estimated width is the sum of the left plane-intersection and intersection-right plane distances. The estimated height is the sum of the top plane-intersection and intersection-bottom plane distances. The accuracy of this method depends on the real-world context. In the case of a floating object, it makes sense to assume the center of the object is at the ground level.
- Camera roll is not yet supported, but could be implemented with minimal changes.
If you use this code, please cite:
@article{grimmer2025debris,
title = {A geometric and deep learning reproducible pipeline for monitoring floating anthropogenic debris in urban rivers using in situ camera},
journal = {Submitted},
volume = {},
pages = {},
year = {2025},
issn = {},
doi = {},
url = {},
author = {Gauthier Grimmer and Romain Wenger and Clément Flint and Germain Forestier and Gilles Rixhon and Valentin Chardon}
}