Skip to content

Commit 26e78fc

Browse files
authored
Merge pull request #785 from luxonis/performant_pocs/extended_3d_measurement
Object Volume Measurement 3D Hub App
2 parents 178fc47 + 5337787 commit 26e78fc

39 files changed

Lines changed: 19793 additions & 0 deletions

apps/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This section contains ready-to-use applications that demonstrate the capabilitie
1212
| [data-collection](data-collection/) |||| | Demo showcasing how to use YOLOE for automatic data capture with an interactive UI for configuration. |
1313
| [dino-tracking](dino-tracking/) |||| | Demo showcasing interactive, similarity-based object tracking using FastSAM segmentation and DINO embeddings, enabling click-to-select tracking without predefined classes. |
1414
| [people-demographics-and-sentiment-analysis](people-demographics-and-sentiment-analysis/) |||| | Detects people and faces, tracks individuals over time, estimates age, gender, emotion and performs re-identification |
15+
| [object-volume-measurement-3d](object-volume-measurement-3d) |||| | Demonstrates a practical approach for measuring objects in 3D using DepthAI |
1516
| [p2p-measurement](p2p-measurement) |||| | Real-time 3D distance measurement between two points using DepthAI |
1617
| [qr-tiling](qr-tiling/) |||| | High-resolution QR code detection using dynamic image tiling with adaptive FPS control and an interactive UI for configuring the tiling grid. |
1718
| [ros-driver-basic](ros/ros-driver-basic/) |||| | Demo showcasing how ROS driver can be run as an APP on RVC4 device. |
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Python virtual environments
2+
venv/
3+
.venv/
4+
5+
# Node.js
6+
# ignore node_modules, it will be reinstalled in the container
7+
node_modules/
8+
9+
# Multimedia files
10+
media/
11+
12+
# Local models
13+
*.onnx
14+
15+
# Documentation
16+
README.md
17+
18+
# VCS
19+
.gitignore
20+
.git/
21+
.github/
22+
.gitlab/
23+
24+
# The following files are ignored by default
25+
# uncomment a line if you explicitly need it
26+
27+
# !*.oakapp
28+
29+
# Python
30+
# !**/.mypy_cache/
31+
# !**/.ruff_cache/
32+
33+
# IDE files
34+
# !**/.idea
35+
# !**/.vscode
36+
# !**/.zed
37+
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Object Volume Measurement 3D
2+
3+
This example demonstrates a practical approach for measuring objects in 3D using DepthAI.\
4+
On the DepthAI backend, it runs **YOLOE** model on-device, with configurable class labels and confidence threshold - both controllable via the frontend.
5+
The custom frontend lets you click a detected object in the Video stream, the backend then segments that instance, builds a segmented point cloud, and computes dimensions and volume in real time. Users can switch between two measurement methods: Object-Oriented Bounding Box and Ground-plane Height Grid.\
6+
The frontend is built with `@luxonis/depthai-viewer-common` package, and combined with the [default oakapp docker image](https://hub.docker.com/r/luxonis/oakapp-base), enabling remote access via WebRTC.
7+
8+
> **Note:** This example works only on RVC4 in standalone mode.
9+
10+
## Demo
11+
12+
![extended-3d-measurement](media/demo.gif)
13+
14+
## Usage
15+
16+
Running this example requires a **Luxonis device** connected to your computer. Refer to the [documentation](https://docs.luxonis.com/software-v3/) to setup your device if you haven't done it already.
17+
18+
### Model Options
19+
20+
This example currently uses **YOLOE** - a fast and efficient object detection model, that outputs bounding boxes and segmentation masks.
21+
22+
### Measurement methods
23+
24+
The app provides two ways to measure objects from the segmented point clouds:
25+
26+
#### 1. Object-Oriented Bounding Box (OBB)
27+
28+
This method uses Open3D's `get_minimal_oriented_bounding_box()`, which computes the minimal 3D box that encloses the segmented point cloud.\
29+
The resulting box provides the object's dimensions (L, W, H) and the volume is computed as: V = L x W x H\
30+
Temporal smoothing is applied to keep the box stable and prevents sudden flips. It combines a low pass filter (EMA) for center and size, and spherical linear interpolation (SLERP) for rotations.\
31+
This method is fast but may overestimate volume for objects with irregular shapes.
32+
33+
#### 2. Ground-plane Height Grid (HG)
34+
35+
For this method the objects are required to rest on a flat surface (e.g desk or floor). It uses the flat surface as a reference support plane, then estimates the footprint and the height by
36+
grid-based slicing of the objects top surface.
37+
38+
**How it works:**
39+
40+
1. Plane capture: we run RANSAC on the scene point cloud and validate with the IMU that the plane is ground-like (plane normal parallel to gravity).
41+
The app shows Calculating / OK / Failed status in the overlay of the Video Stream and re-requests capture if the camera has been moved or plane becomes invalid.
42+
2. Transform the object point cloud into the ground/table frame.
43+
3. Compute a minimum-area rectangle for the footprint of the object. From here we get the L, W and yaw (rotation along the z axis).
44+
4. Volume calculation: the footprint polygon is divided into a 2D grid of square cells (default 5 mm each). For every cell inside the footprint, the algorithm estimates a height value by looking at the object points that fall into that cell. The base area of each cell = (cell size)² and height = cell height above the ground plane.\
45+
The total object volume is obtained by summing the volumes of each cell across the grid. The object's height H is computed from this height grid also.
46+
5. Temporal smoothing is applied to the footprint, yaw, height, and dimensions (EMA-based), with rejection of sudden jumps.
47+
48+
This grid-integration method makes the volume estimation more robust to irregular and uneven object surfaces compared to just taking the bounding box. However, it is sensitive to plane fitting errors.
49+
50+
> **Note:** the object dimensions are still represented as a box, even for irregular objects.
51+
52+
### Outputs
53+
54+
The backend publishes:
55+
56+
- Video Stream
57+
- Detections Overlay with segmentation masks and bounding boxes
58+
- Pointclouds Stream (whole scene and segmented when measuring an object)
59+
- Measurements Overlay (OBB / HG wireframe from the object dimensions on the Video Stream)
60+
- Plane status (HG only)
61+
- Dimensions and volume measurements with the Detections Overlay
62+
63+
## Standalone Mode (RVC4 only)
64+
65+
Running the example in the standalone mode, app runs entirely on the device.
66+
To run the example in this mode, first install the `oakctl` tool using the installation instructions [here](https://docs.luxonis.com/software-v3/oak-apps/oakctl).
67+
68+
The app can then be run with:
69+
70+
```bash
71+
oakctl connect <DEVICE_IP>
72+
oakctl app run .
73+
```
74+
75+
Once the app is built and running you can access the DepthAI Viewer locally by opening `https://<OAK4_IP>:9000/` in your browser (the exact URL will be shown in the terminal output).
76+
77+
### Remote access
78+
79+
1. You can upload oakapp to Luxonis Hub via oakctl
80+
2. And then you can just remotely open App UI via App detail page
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#!/bin/sh
2+
echo "Starting Backend"
3+
exec python3.11 /app/backend/src/main.py
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
model: yoloe-v8-l:640x640
2+
platform: RVC4
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
import depthai as dai
2+
3+
from depthai_nodes.node import ParsingNeuralNetwork, ImgDetectionsFilter
4+
5+
from utils.helper_functions import extract_text_embeddings, read_intrinsics
6+
7+
from utils.arguments import initialize_argparser
8+
from utils.annotation_node import AnnotationNode
9+
from utils.measurement_node import MeasurementNode
10+
11+
_, args = initialize_argparser()
12+
13+
IP = args.ip or "localhost"
14+
PORT = args.port or 8080
15+
16+
CLASS_NAMES = ["person", "chair", "TV"]
17+
MAX_NUM_CLASSES = 80
18+
CONFIDENCE_THRESHOLD = 0.15
19+
20+
visualizer = dai.RemoteConnection(serveFrontend=False)
21+
device = dai.Device(dai.DeviceInfo(args.device)) if args.device else dai.Device()
22+
23+
platform = device.getPlatformAsString()
24+
25+
if platform != "RVC4":
26+
raise ValueError("This example is supported only on RVC4 platform")
27+
28+
device.setIrLaserDotProjectorIntensity(1.0)
29+
device.setIrFloodLightIntensity(1)
30+
31+
frame_type = dai.ImgFrame.Type.BGR888i
32+
text_features = extract_text_embeddings(
33+
class_names=CLASS_NAMES, max_num_classes=MAX_NUM_CLASSES
34+
)
35+
36+
if args.fps_limit is None:
37+
args.fps_limit = 8
38+
print(
39+
f"\nFPS limit set to {args.fps_limit} for {platform} platform. If you want to set a custom FPS limit, use the --fps_limit flag.\n"
40+
)
41+
42+
with dai.Pipeline(device) as pipeline:
43+
print("Creating pipeline...")
44+
45+
model_description = dai.NNModelDescription.fromYamlFile(
46+
f"yoloe_v8_l.{platform}.yaml"
47+
)
48+
model_description.platform = platform
49+
model_nn_archive = dai.NNArchive(dai.getModelFromZoo(model_description))
50+
model_w, model_h = model_nn_archive.getInputSize()
51+
52+
cam = pipeline.create(dai.node.Camera).build(
53+
boardSocket=dai.CameraBoardSocket.CAM_A
54+
)
55+
cam_out = cam.requestOutput(
56+
size=(640, 400), type=dai.ImgFrame.Type.RGB888i, fps=args.fps_limit
57+
)
58+
59+
left = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_B)
60+
right = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_C)
61+
left_out = left.requestOutput(
62+
(640, 400), type=dai.ImgFrame.Type.NV12, fps=args.fps_limit
63+
)
64+
right_out = right.requestOutput(
65+
(640, 400), type=dai.ImgFrame.Type.NV12, fps=args.fps_limit
66+
)
67+
68+
stereo = pipeline.create(dai.node.StereoDepth).build(
69+
left=left_out,
70+
right=right_out,
71+
presetMode=dai.node.StereoDepth.PresetMode.DEFAULT,
72+
)
73+
74+
imu = pipeline.create(dai.node.IMU)
75+
imu.enableIMUSensor(dai.IMUSensor.ACCELEROMETER_RAW, 100)
76+
imu.setBatchReportThreshold(10)
77+
imu.setMaxBatchReports(10)
78+
79+
manip = pipeline.create(dai.node.ImageManip)
80+
manip.initialConfig.setOutputSize(
81+
model_w, model_h, dai.ImageManipConfig.ResizeMode.LETTERBOX
82+
)
83+
manip.initialConfig.setFrameType(frame_type)
84+
manip.setMaxOutputFrameSize(model_w * model_h * 3)
85+
86+
align = pipeline.create(dai.node.ImageAlign)
87+
88+
stereo.depth.link(align.input)
89+
cam_out.link(align.inputAlignTo)
90+
cam_out.link(manip.inputImage)
91+
92+
input_node = manip.out
93+
94+
nn_with_parser = pipeline.create(ParsingNeuralNetwork)
95+
nn_with_parser.setNNArchive(model_nn_archive)
96+
nn_with_parser.setBackend("snpe")
97+
nn_with_parser.setBackendProperties(
98+
{"runtime": "dsp", "performance_profile": "default"}
99+
)
100+
nn_with_parser.setNumInferenceThreads(1)
101+
nn_with_parser.getParser(0).setConfidenceThreshold(CONFIDENCE_THRESHOLD)
102+
103+
input_node.link(nn_with_parser.inputs["images"])
104+
105+
textInputQueue = nn_with_parser.inputs["texts"].createInputQueue()
106+
nn_with_parser.inputs["texts"].setReusePreviousMessage(True)
107+
108+
det_process_filter = pipeline.create(ImgDetectionsFilter).build(nn_with_parser.out)
109+
det_process_filter.setLabels(labels=[i for i in range(len(CLASS_NAMES))], keep=True)
110+
111+
# Annotation node
112+
annotation_node = pipeline.create(AnnotationNode).build(
113+
det_process_filter.out,
114+
cam_out,
115+
align.outputAligned,
116+
label_encoding={k: v for k, v in enumerate(CLASS_NAMES)},
117+
)
118+
119+
# RGBD node for the segmented PCL
120+
rgbd_seg = pipeline.create(dai.node.RGBD).build()
121+
annotation_node.out_segm.link(rgbd_seg.inColor)
122+
annotation_node.out_segm_depth.link(rgbd_seg.inDepth)
123+
124+
# Measurement node
125+
measurement_node = pipeline.create(MeasurementNode).build(
126+
rgbd_seg.pcl, annotation_node.out_selection, imu.out
127+
)
128+
measurement_node.out_result.link(annotation_node.in_meas_result)
129+
130+
fx, fy, cx, cy = read_intrinsics(device, 640, 400)
131+
measurement_node.setIntrinsics(fx, fy, cx, cy, imgW=640, imgH=400)
132+
measurement_node.an_node = annotation_node
133+
134+
# Service functions for all functionalities of the frontend
135+
def class_update_service(new_classes: list[str]):
136+
"""Changes classes to detect based on the user input"""
137+
if len(new_classes) == 0:
138+
print("List of new classes empty, skipping.")
139+
return
140+
if len(new_classes) > MAX_NUM_CLASSES:
141+
print(
142+
f"Number of new classes ({len(new_classes)}) exceeds maximum number of classes ({MAX_NUM_CLASSES}), skipping."
143+
)
144+
return
145+
CLASS_NAMES = new_classes
146+
147+
text_features = extract_text_embeddings(
148+
class_names=CLASS_NAMES,
149+
max_num_classes=MAX_NUM_CLASSES,
150+
)
151+
inputNNData = dai.NNData()
152+
inputNNData.addTensor(
153+
"texts", text_features, dataType=dai.TensorInfo.DataType.FP16
154+
)
155+
textInputQueue.send(inputNNData)
156+
157+
det_process_filter.setLabels(
158+
labels=[i for i in range(len(CLASS_NAMES))], keep=True
159+
)
160+
annotation_node.setLabelEncoding({k: v for k, v in enumerate(CLASS_NAMES)})
161+
print(f"Classes set to: {CLASS_NAMES}")
162+
163+
def conf_threshold_update_service(new_conf_threshold: float):
164+
"""Changes confidence threshold based on the user input"""
165+
CONFIDENCE_THRESHOLD = max(0, min(1, new_conf_threshold))
166+
nn_with_parser.getParser(0).setConfidenceThreshold(CONFIDENCE_THRESHOLD)
167+
print(f"Confidence threshold set to: {CONFIDENCE_THRESHOLD}:")
168+
169+
def selection_service(clicks: dict):
170+
"""Changes selected object based on the user click"""
171+
if clicks.get("clear"):
172+
annotation_node.clearSelection()
173+
return {"ok": True, "cleared": True}
174+
try:
175+
x = float(clicks["x"])
176+
y = float(clicks["y"])
177+
except Exception as e:
178+
return {"ok": False, "error": f"bad payload: {e}"}
179+
180+
annotation_node.setSelectionPoint(x, y)
181+
annotation_node.setKeepTopOnly(True)
182+
183+
measurement_node.reset_measurements()
184+
annotation_node.clearCachedMeasurements()
185+
print(f"Selection point set to ({x:.3f}, {y:.3f})")
186+
return {"ok": True}
187+
188+
def measurement_method_service(payload: dict):
189+
"""
190+
Changes measurement method based on the user input
191+
Expects: {"method": "obb"|"heightgrid"}
192+
"""
193+
method = str(payload.get("method", "")).lower()
194+
if method not in ("obb", "heightgrid"):
195+
return {"ok": False, "error": f"unknown method '{method}'"}
196+
measurement_node.measurement_mode = method
197+
if method == "heightgrid":
198+
annotation_node.requestPlaneCapture(True)
199+
else:
200+
annotation_node.requestPlaneCapture(False)
201+
measurement_node.reset_measurements()
202+
print("Selected method: ", method)
203+
return {"ok": True, "method": method, "have_plane": measurement_node.have_plane}
204+
205+
# Connect the services in the frontend to functions in the backend
206+
visualizer.registerService("Selection Service", selection_service)
207+
visualizer.registerService("Class Update Service", class_update_service)
208+
visualizer.registerService(
209+
"Threshold Update Service", conf_threshold_update_service
210+
)
211+
visualizer.registerService("Measurement Method Service", measurement_method_service)
212+
213+
visualizer.addTopic("Video", cam_out, "images")
214+
visualizer.addTopic("Detections", annotation_node.out_ann, "images")
215+
visualizer.addTopic("Pointclouds", rgbd_seg.pcl, "point_clouds")
216+
visualizer.addTopic("Measurement Overlay", measurement_node.out_ann, "images")
217+
visualizer.addTopic("Plane Status", measurement_node.out_plane_status, "images")
218+
219+
print("Pipeline created.")
220+
221+
pipeline.start()
222+
visualizer.registerPipeline(pipeline)
223+
224+
inputNNData = dai.NNData()
225+
inputNNData.addTensor("texts", text_features, dataType=dai.TensorInfo.DataType.FP16)
226+
textInputQueue.send(inputNNData)
227+
228+
print("Press 'q' to stop")
229+
230+
while pipeline.isRunning():
231+
pipeline.processTasks()
232+
key = visualizer.waitKey(1)
233+
if key == ord("q"):
234+
print("Got q key. Exiting...")
235+
break
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
depthai==3.2.1
2+
depthai-nodes==0.3.7
3+
opencv-python-headless~=4.10.0
4+
numpy>=1.22
5+
tokenizers~=0.21.0
6+
onnxruntime
7+
open3d~=0.18
8+
scipy==1.11.4
9+
# onnxruntime-gpu # if you want to use CUDAExecutionProvider
10+
requests
11+
tqdm

0 commit comments

Comments
 (0)