Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
5bd7191
Add extended 3D measurement PoC
bblazeva Aug 20, 2025
cd04618
Merge branch 'main' into performant_pocs/extended_3d_measurement
bblazeva Aug 20, 2025
9098fa8
Extended 3D measurement v1
bblazeva Sep 22, 2025
104dcbd
added IMU check for support plane
bblazeva Sep 24, 2025
ed89992
added display of plane capture status in FE + request for new plane c…
bblazeva Sep 25, 2025
b79c4ce
UX improvements + removed node_modules added by mistake
bblazeva Sep 29, 2025
8216051
removed unwanted files
bblazeva Sep 29, 2025
c6a48fe
changed detection confidence threshold in FE
bblazeva Sep 29, 2025
14e6a04
change the app version to `0.9.0`
Sep 29, 2025
f5d09b3
change app identifier to `Extended-3D-Measurement`
Sep 29, 2025
79018cc
change app identifier to `luxonis.com.extended-3d-measurement`
Sep 29, 2025
2cb2a2a
update README
bblazeva Oct 7, 2025
bd0c45b
update README
bblazeva Oct 7, 2025
f04b3cc
add box pose smoothing/filtering + clean up
bblazeva Oct 13, 2025
789743b
cahnge app indentifier to `com.luxonis.extended-3d-measurement`
Oct 14, 2025
bff104b
update app version to `0.9.1`
Oct 14, 2025
233ad34
move to `apps` folder
Nov 3, 2025
71db671
- update app version to 0.9.2
Nov 27, 2025
fa82408
pre-comit run
Nov 28, 2025
44371d9
fix tests
Dec 1, 2025
11d3ac1
Merge branch 'main' into performant_pocs/extended_3d_measurement
PetrNovota Dec 1, 2025
03a4e14
fix tests
bblazeva Dec 4, 2025
07c6743
Merge branch 'main' into performant_pocs/extended_3d_measurement
klemen1999 Dec 8, 2025
02a5951
added to main readme and tests
klemen1999 Dec 8, 2025
882bc9e
fix: PC not rendering
DavidFencl Dec 9, 2025
4589a3d
remove yolo-world, update requirements and app version to 0.9.3
bblazeva Dec 9, 2025
3da73b0
ruff-format
bblazeva Dec 9, 2025
79fbdf8
rename to object measurement 3d
bblazeva Dec 11, 2025
3a4f3bd
Merge branch 'main' into performant_pocs/extended_3d_measurement
bblazeva Dec 11, 2025
ed20d47
change app identifier
bblazeva Dec 11, 2025
899a8da
pull encoders from hub, FE - bump common package and react versions -…
bblazeva Mar 5, 2026
eade5be
Merge branch 'main' into performant_pocs/extended_3d_measurement
bblazeva Mar 5, 2026
f351e30
update requirements
bblazeva Mar 5, 2026
71ddb86
app version 0.9.4, update depthai-viewer-common - annotations bug fix
bblazeva Mar 6, 2026
5337787
Merge remote-tracking branch 'origin/main' into performant_pocs/exten…
bblazeva Mar 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions apps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ This section contains ready-to-use applications that demonstrate the capabilitie
| [data-collection](data-collection/) | ❌ | ❌ | ✅ | | Demo showcasing how to use YOLOE for automatic data capture with an interactive UI for configuration. |
| [dino-tracking](dino-tracking/) | ❌ | ❌ | ✅ | | Demo showcasing interactive, similarity-based object tracking using FastSAM segmentation and DINO embeddings, enabling click-to-select tracking without predefined classes. |
| [people-demographics-and-sentiment-analysis](people-demographics-and-sentiment-analysis/) | ❌ | ❌ | ✅ | | Detects people and faces, tracks individuals over time, estimates age, gender, emotion and performs re-identification |
| [object-volume-measurement-3d](object-volume-measurement-3d) | ❌ | ❌ | ✅ | | Demonstrates a practical approach for measuring objects in 3D using DepthAI |
| [p2p-measurement](p2p-measurement) | ❌ | ❌ | ✅ | | Real-time 3D distance measurement between two points using DepthAI |
| [qr-tiling](qr-tiling/) | ❌ | ❌ | ✅ | | High-resolution QR code detection using dynamic image tiling with adaptive FPS control and an interactive UI for configuring the tiling grid. |
| [ros-driver-basic](ros/ros-driver-basic/) | ❌ | ❌ | ✅ | | Demo showcasing how ROS driver can be run as an APP on RVC4 device. |
Expand Down
37 changes: 37 additions & 0 deletions apps/object-volume-measurement-3d/.oakappignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Python virtual environments
venv/
.venv/

# Node.js
# ignore node_modules, it will be reinstalled in the container
node_modules/

# Multimedia files
media/

# Local models
*.onnx

# Documentation
README.md

# VCS
.gitignore
.git/
.github/
.gitlab/

# The following files are ignored by default
# uncomment a line if you explicitly need it

# !*.oakapp

# Python
# !**/.mypy_cache/
# !**/.ruff_cache/

# IDE files
# !**/.idea
# !**/.vscode
# !**/.zed

80 changes: 80 additions & 0 deletions apps/object-volume-measurement-3d/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Object Volume Measurement 3D

This example demonstrates a practical approach for measuring objects in 3D using DepthAI.\
On the DepthAI backend, it runs **YOLOE** model on-device, with configurable class labels and confidence threshold - both controllable via the frontend.
The custom frontend lets you click a detected object in the Video stream, the backend then segments that instance, builds a segmented point cloud, and computes dimensions and volume in real time. Users can switch between two measurement methods: Object-Oriented Bounding Box and Ground-plane Height Grid.\
The frontend is built with `@luxonis/depthai-viewer-common` package, and combined with the [default oakapp docker image](https://hub.docker.com/r/luxonis/oakapp-base), enabling remote access via WebRTC.

> **Note:** This example works only on RVC4 in standalone mode.

## Demo

![extended-3d-measurement](media/demo.gif)

## Usage

Running this example requires a **Luxonis device** connected to your computer. Refer to the [documentation](https://docs.luxonis.com/software-v3/) to setup your device if you haven't done it already.

### Model Options

This example currently uses **YOLOE** - a fast and efficient object detection model, that outputs bounding boxes and segmentation masks.

### Measurement methods

The app provides two ways to measure objects from the segmented point clouds:

#### 1. Object-Oriented Bounding Box (OBB)

This method uses Open3D's `get_minimal_oriented_bounding_box()`, which computes the minimal 3D box that encloses the segmented point cloud.\
The resulting box provides the object's dimensions (L, W, H) and the volume is computed as: V = L x W x H\
Temporal smoothing is applied to keep the box stable and prevents sudden flips. It combines a low pass filter (EMA) for center and size, and spherical linear interpolation (SLERP) for rotations.\
This method is fast but may overestimate volume for objects with irregular shapes.

#### 2. Ground-plane Height Grid (HG)

For this method the objects are required to rest on a flat surface (e.g desk or floor). It uses the flat surface as a reference support plane, then estimates the footprint and the height by
grid-based slicing of the objects top surface.

**How it works:**

1. Plane capture: we run RANSAC on the scene point cloud and validate with the IMU that the plane is ground-like (plane normal parallel to gravity).
The app shows Calculating / OK / Failed status in the overlay of the Video Stream and re-requests capture if the camera has been moved or plane becomes invalid.
2. Transform the object point cloud into the ground/table frame.
3. Compute a minimum-area rectangle for the footprint of the object. From here we get the L, W and yaw (rotation along the z axis).
4. Volume calculation: the footprint polygon is divided into a 2D grid of square cells (default 5 mm each). For every cell inside the footprint, the algorithm estimates a height value by looking at the object points that fall into that cell. The base area of each cell = (cell size)² and height = cell height above the ground plane.\
The total object volume is obtained by summing the volumes of each cell across the grid. The object's height H is computed from this height grid also.
5. Temporal smoothing is applied to the footprint, yaw, height, and dimensions (EMA-based), with rejection of sudden jumps.

This grid-integration method makes the volume estimation more robust to irregular and uneven object surfaces compared to just taking the bounding box. However, it is sensitive to plane fitting errors.

> **Note:** the object dimensions are still represented as a box, even for irregular objects.

### Outputs

The backend publishes:

- Video Stream
- Detections Overlay with segmentation masks and bounding boxes
- Pointclouds Stream (whole scene and segmented when measuring an object)
- Measurements Overlay (OBB / HG wireframe from the object dimensions on the Video Stream)
- Plane status (HG only)
- Dimensions and volume measurements with the Detections Overlay

## Standalone Mode (RVC4 only)

Running the example in the standalone mode, app runs entirely on the device.
To run the example in this mode, first install the `oakctl` tool using the installation instructions [here](https://docs.luxonis.com/software-v3/oak-apps/oakctl).

The app can then be run with:

```bash
oakctl connect <DEVICE_IP>
oakctl app run .
```

Once the app is built and running you can access the DepthAI Viewer locally by opening `https://<OAK4_IP>:9000/` in your browser (the exact URL will be shown in the terminal output).

### Remote access

1. You can upload oakapp to Luxonis Hub via oakctl
2. And then you can just remotely open App UI via App detail page
3 changes: 3 additions & 0 deletions apps/object-volume-measurement-3d/backend-run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/sh
echo "Starting Backend"
exec python3.11 /app/backend/src/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
model: yoloe-v8-l:640x640
platform: RVC4
235 changes: 235 additions & 0 deletions apps/object-volume-measurement-3d/backend/src/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
import depthai as dai

from depthai_nodes.node import ParsingNeuralNetwork, ImgDetectionsFilter

from utils.helper_functions import extract_text_embeddings, read_intrinsics

from utils.arguments import initialize_argparser
from utils.annotation_node import AnnotationNode
from utils.measurement_node import MeasurementNode

_, args = initialize_argparser()

IP = args.ip or "localhost"
PORT = args.port or 8080

CLASS_NAMES = ["person", "chair", "TV"]
MAX_NUM_CLASSES = 80
CONFIDENCE_THRESHOLD = 0.15

visualizer = dai.RemoteConnection(serveFrontend=False)
device = dai.Device(dai.DeviceInfo(args.device)) if args.device else dai.Device()

platform = device.getPlatformAsString()

if platform != "RVC4":
raise ValueError("This example is supported only on RVC4 platform")

device.setIrLaserDotProjectorIntensity(1.0)
device.setIrFloodLightIntensity(1)

frame_type = dai.ImgFrame.Type.BGR888i
text_features = extract_text_embeddings(
class_names=CLASS_NAMES, max_num_classes=MAX_NUM_CLASSES
)

if args.fps_limit is None:
args.fps_limit = 8
print(
f"\nFPS limit set to {args.fps_limit} for {platform} platform. If you want to set a custom FPS limit, use the --fps_limit flag.\n"
)

with dai.Pipeline(device) as pipeline:
print("Creating pipeline...")

model_description = dai.NNModelDescription.fromYamlFile(
f"yoloe_v8_l.{platform}.yaml"
)
model_description.platform = platform
model_nn_archive = dai.NNArchive(dai.getModelFromZoo(model_description))
model_w, model_h = model_nn_archive.getInputSize()

cam = pipeline.create(dai.node.Camera).build(
boardSocket=dai.CameraBoardSocket.CAM_A
)
cam_out = cam.requestOutput(
size=(640, 400), type=dai.ImgFrame.Type.RGB888i, fps=args.fps_limit
)

left = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_B)
right = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_C)
left_out = left.requestOutput(
(640, 400), type=dai.ImgFrame.Type.NV12, fps=args.fps_limit
)
right_out = right.requestOutput(
(640, 400), type=dai.ImgFrame.Type.NV12, fps=args.fps_limit
)

stereo = pipeline.create(dai.node.StereoDepth).build(
left=left_out,
right=right_out,
presetMode=dai.node.StereoDepth.PresetMode.DEFAULT,
)

imu = pipeline.create(dai.node.IMU)
imu.enableIMUSensor(dai.IMUSensor.ACCELEROMETER_RAW, 100)
imu.setBatchReportThreshold(10)
imu.setMaxBatchReports(10)

manip = pipeline.create(dai.node.ImageManip)
manip.initialConfig.setOutputSize(
model_w, model_h, dai.ImageManipConfig.ResizeMode.LETTERBOX
)
manip.initialConfig.setFrameType(frame_type)
manip.setMaxOutputFrameSize(model_w * model_h * 3)

align = pipeline.create(dai.node.ImageAlign)

stereo.depth.link(align.input)
cam_out.link(align.inputAlignTo)
cam_out.link(manip.inputImage)

input_node = manip.out

nn_with_parser = pipeline.create(ParsingNeuralNetwork)
nn_with_parser.setNNArchive(model_nn_archive)
nn_with_parser.setBackend("snpe")
nn_with_parser.setBackendProperties(
{"runtime": "dsp", "performance_profile": "default"}
)
nn_with_parser.setNumInferenceThreads(1)
nn_with_parser.getParser(0).setConfidenceThreshold(CONFIDENCE_THRESHOLD)

input_node.link(nn_with_parser.inputs["images"])

textInputQueue = nn_with_parser.inputs["texts"].createInputQueue()
nn_with_parser.inputs["texts"].setReusePreviousMessage(True)

det_process_filter = pipeline.create(ImgDetectionsFilter).build(nn_with_parser.out)
det_process_filter.setLabels(labels=[i for i in range(len(CLASS_NAMES))], keep=True)

# Annotation node
annotation_node = pipeline.create(AnnotationNode).build(
det_process_filter.out,
cam_out,
align.outputAligned,
label_encoding={k: v for k, v in enumerate(CLASS_NAMES)},
)

# RGBD node for the segmented PCL
rgbd_seg = pipeline.create(dai.node.RGBD).build()
annotation_node.out_segm.link(rgbd_seg.inColor)
annotation_node.out_segm_depth.link(rgbd_seg.inDepth)

# Measurement node
measurement_node = pipeline.create(MeasurementNode).build(
rgbd_seg.pcl, annotation_node.out_selection, imu.out
)
measurement_node.out_result.link(annotation_node.in_meas_result)

fx, fy, cx, cy = read_intrinsics(device, 640, 400)
measurement_node.setIntrinsics(fx, fy, cx, cy, imgW=640, imgH=400)
measurement_node.an_node = annotation_node

# Service functions for all functionalities of the frontend
def class_update_service(new_classes: list[str]):
"""Changes classes to detect based on the user input"""
if len(new_classes) == 0:
print("List of new classes empty, skipping.")
return
if len(new_classes) > MAX_NUM_CLASSES:
print(
f"Number of new classes ({len(new_classes)}) exceeds maximum number of classes ({MAX_NUM_CLASSES}), skipping."
)
return
CLASS_NAMES = new_classes

text_features = extract_text_embeddings(
class_names=CLASS_NAMES,
max_num_classes=MAX_NUM_CLASSES,
)
inputNNData = dai.NNData()
inputNNData.addTensor(
"texts", text_features, dataType=dai.TensorInfo.DataType.FP16
)
textInputQueue.send(inputNNData)

det_process_filter.setLabels(
labels=[i for i in range(len(CLASS_NAMES))], keep=True
)
annotation_node.setLabelEncoding({k: v for k, v in enumerate(CLASS_NAMES)})
print(f"Classes set to: {CLASS_NAMES}")

def conf_threshold_update_service(new_conf_threshold: float):
"""Changes confidence threshold based on the user input"""
CONFIDENCE_THRESHOLD = max(0, min(1, new_conf_threshold))
nn_with_parser.getParser(0).setConfidenceThreshold(CONFIDENCE_THRESHOLD)
print(f"Confidence threshold set to: {CONFIDENCE_THRESHOLD}:")

def selection_service(clicks: dict):
"""Changes selected object based on the user click"""
if clicks.get("clear"):
annotation_node.clearSelection()
return {"ok": True, "cleared": True}
try:
x = float(clicks["x"])
y = float(clicks["y"])
except Exception as e:
return {"ok": False, "error": f"bad payload: {e}"}

annotation_node.setSelectionPoint(x, y)
annotation_node.setKeepTopOnly(True)

measurement_node.reset_measurements()
annotation_node.clearCachedMeasurements()
print(f"Selection point set to ({x:.3f}, {y:.3f})")
return {"ok": True}

def measurement_method_service(payload: dict):
"""
Changes measurement method based on the user input
Expects: {"method": "obb"|"heightgrid"}
"""
method = str(payload.get("method", "")).lower()
if method not in ("obb", "heightgrid"):
return {"ok": False, "error": f"unknown method '{method}'"}
measurement_node.measurement_mode = method
if method == "heightgrid":
annotation_node.requestPlaneCapture(True)
else:
annotation_node.requestPlaneCapture(False)
measurement_node.reset_measurements()
print("Selected method: ", method)
return {"ok": True, "method": method, "have_plane": measurement_node.have_plane}

# Connect the services in the frontend to functions in the backend
visualizer.registerService("Selection Service", selection_service)
visualizer.registerService("Class Update Service", class_update_service)
visualizer.registerService(
"Threshold Update Service", conf_threshold_update_service
)
visualizer.registerService("Measurement Method Service", measurement_method_service)

visualizer.addTopic("Video", cam_out, "images")
visualizer.addTopic("Detections", annotation_node.out_ann, "images")
visualizer.addTopic("Pointclouds", rgbd_seg.pcl, "point_clouds")
visualizer.addTopic("Measurement Overlay", measurement_node.out_ann, "images")
visualizer.addTopic("Plane Status", measurement_node.out_plane_status, "images")

print("Pipeline created.")

pipeline.start()
visualizer.registerPipeline(pipeline)

inputNNData = dai.NNData()
inputNNData.addTensor("texts", text_features, dataType=dai.TensorInfo.DataType.FP16)
textInputQueue.send(inputNNData)

print("Press 'q' to stop")

while pipeline.isRunning():
pipeline.processTasks()
key = visualizer.waitKey(1)
if key == ord("q"):
print("Got q key. Exiting...")
break
11 changes: 11 additions & 0 deletions apps/object-volume-measurement-3d/backend/src/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
depthai==3.2.1
depthai-nodes==0.3.7
opencv-python-headless~=4.10.0
numpy>=1.22
tokenizers~=0.21.0
onnxruntime
open3d~=0.18
scipy==1.11.4
# onnxruntime-gpu # if you want to use CUDAExecutionProvider
requests
tqdm
Loading