luxonis · bblazeva · Mar 9, 2026 · Aug 20, 2025 · Aug 20, 2025 · Sep 22, 2025
diff --git a/apps/README.md b/apps/README.md
@@ -12,6 +12,7 @@ This section contains ready-to-use applications that demonstrate the capabilitie
 | [data-collection](data-collection/)                                                       | ❌   | ❌                | ✅                |           | Demo showcasing how to use YOLOE for automatic data capture with an interactive UI for configuration.                                                                       |
 | [dino-tracking](dino-tracking/)                                                           | ❌   | ❌                | ✅                |           | Demo showcasing interactive, similarity-based object tracking using FastSAM segmentation and DINO embeddings, enabling click-to-select tracking without predefined classes. |
 | [people-demographics-and-sentiment-analysis](people-demographics-and-sentiment-analysis/) | ❌   | ❌                | ✅                |           | Detects people and faces, tracks individuals over time, estimates age, gender, emotion and performs re-identification                                                       |
+| [object-volume-measurement-3d](object-volume-measurement-3d)                              | ❌   | ❌                | ✅                |           | Demonstrates a practical approach for measuring objects in 3D using DepthAI                                                                                                 |
 | [p2p-measurement](p2p-measurement)                                                        | ❌   | ❌                | ✅                |           | Real-time 3D distance measurement between two points using DepthAI                                                                                                          |
 | [qr-tiling](qr-tiling/)                                                                   | ❌   | ❌                | ✅                |           | High-resolution QR code detection using dynamic image tiling with adaptive FPS control and an interactive UI for configuring the tiling grid.                               |
 | [ros-driver-basic](ros/ros-driver-basic/)                                                 | ❌   | ❌                | ✅                |           | Demo showcasing how ROS driver can be run as an APP on RVC4 device.                                                                                                         |

diff --git a/apps/object-volume-measurement-3d/.oakappignore b/apps/object-volume-measurement-3d/.oakappignore
@@ -0,0 +1,37 @@
+# Python virtual environments
+venv/
+.venv/
+
+# Node.js
+# ignore node_modules, it will be reinstalled in the container
+node_modules/
+
+# Multimedia files
+media/
+
+# Local models
+*.onnx
+
+# Documentation
+README.md
+
+# VCS
+.gitignore
+.git/
+.github/
+.gitlab/
+
+# The following files are ignored by default
+# uncomment a line if you explicitly need it
+
+# !*.oakapp
+
+# Python
+# !**/.mypy_cache/
+# !**/.ruff_cache/
+
+# IDE files
+# !**/.idea
+# !**/.vscode
+# !**/.zed
+
diff --git a/apps/object-volume-measurement-3d/README.md b/apps/object-volume-measurement-3d/README.md
@@ -0,0 +1,80 @@
+# Object Volume Measurement 3D
+
+This example demonstrates a practical approach for measuring objects in 3D using DepthAI.\
+On the DepthAI backend, it runs **YOLOE** model on-device, with configurable class labels and confidence threshold - both controllable via the frontend.
+The custom frontend lets you click a detected object in the Video stream, the backend then segments that instance, builds a segmented point cloud, and computes dimensions and volume in real time. Users can switch between two measurement methods: Object-Oriented Bounding Box and Ground-plane Height Grid.\
+The frontend is built with `@luxonis/depthai-viewer-common` package, and combined with the [default oakapp docker image](https://hub.docker.com/r/luxonis/oakapp-base), enabling remote access via WebRTC.
+
+> **Note:** This example works only on RVC4 in standalone mode.
+
+## Demo
+
+![extended-3d-measurement](media/demo.gif)
+
+## Usage
+
+Running this example requires a **Luxonis device** connected to your computer. Refer to the [documentation](https://docs.luxonis.com/software-v3/) to setup your device if you haven't done it already.
+
+### Model Options
+
+This example currently uses **YOLOE** - a fast and efficient object detection model, that outputs bounding boxes and segmentation masks.
+
+### Measurement methods
+
+The app provides two ways to measure objects from the segmented point clouds:
+
+#### 1. Object-Oriented Bounding Box (OBB)
+
+This method uses Open3D's `get_minimal_oriented_bounding_box()`, which computes the minimal 3D box that encloses the segmented point cloud.\
+The resulting box provides the object's dimensions (L, W, H) and the volume is computed as: V = L x W x H\
+Temporal smoothing is applied to keep the box stable and prevents sudden flips. It combines a low pass filter (EMA) for center and size, and spherical linear interpolation (SLERP) for rotations.\
+This method is fast but may overestimate volume for objects with irregular shapes.
+
+#### 2. Ground-plane Height Grid (HG)
+
+For this method the objects are required to rest on a flat surface (e.g desk or floor). It uses the flat surface as a reference support plane, then estimates the footprint and the height by
+grid-based slicing of the objects top surface.
+
+**How it works:**
+
+1. Plane capture: we run RANSAC on the scene point cloud and validate with the IMU that the plane is ground-like (plane normal parallel to gravity).
+   The app shows Calculating / OK / Failed status in the overlay of the Video Stream and re-requests capture if the camera has been moved or plane becomes invalid.
+2. Transform the object point cloud into the ground/table frame.
+3. Compute a minimum-area rectangle for the footprint of the object. From here we get the L, W and yaw (rotation along the z axis).
+4. Volume calculation: the footprint polygon is divided into a 2D grid of square cells (default 5 mm each). For every cell inside the footprint, the algorithm estimates a height value by looking at the object points that fall into that cell. The base area of each cell = (cell size)² and height = cell height above the ground plane.\
+   The total object volume is obtained by summing the volumes of each cell across the grid. The object's height H is computed from this height grid also.
+5. Temporal smoothing is applied to the footprint, yaw, height, and dimensions (EMA-based), with rejection of sudden jumps.
+
+This grid-integration method makes the volume estimation more robust to irregular and uneven object surfaces compared to just taking the bounding box. However, it is sensitive to plane fitting errors.
+
+> **Note:** the object dimensions are still represented as a box, even for irregular objects.
+
+### Outputs
+
+The backend publishes:
+
+- Video Stream
+- Detections Overlay with segmentation masks and bounding boxes
+- Pointclouds Stream (whole scene and segmented when measuring an object)
+- Measurements Overlay (OBB / HG wireframe from the object dimensions on the Video Stream)
+- Plane status (HG only)
+- Dimensions and volume measurements with the Detections Overlay
+
+## Standalone Mode (RVC4 only)
+
+Running the example in the standalone mode, app runs entirely on the device.
+To run the example in this mode, first install the `oakctl` tool using the installation instructions [here](https://docs.luxonis.com/software-v3/oak-apps/oakctl).
+
+The app can then be run with:
+
+```bash
+oakctl connect <DEVICE_IP>
+oakctl app run .
+```
+
+Once the app is built and running you can access the DepthAI Viewer locally by opening `https://<OAK4_IP>:9000/` in your browser (the exact URL will be shown in the terminal output).
+
+### Remote access
+
+1. You can upload oakapp to Luxonis Hub via oakctl
+2. And then you can just remotely open App UI via App detail page
diff --git a/apps/object-volume-measurement-3d/backend-run.sh b/apps/object-volume-measurement-3d/backend-run.sh
@@ -0,0 +1,3 @@
+#!/bin/sh
+echo "Starting Backend"
+exec python3.11 /app/backend/src/main.py
diff --git a/apps/object-volume-measurement-3d/backend/src/depthai_models/yoloe_v8_l.RVC4.yaml b/apps/object-volume-measurement-3d/backend/src/depthai_models/yoloe_v8_l.RVC4.yaml
@@ -0,0 +1,2 @@
+model: yoloe-v8-l:640x640
+platform: RVC4
diff --git a/apps/object-volume-measurement-3d/backend/src/main.py b/apps/object-volume-measurement-3d/backend/src/main.py
@@ -0,0 +1,235 @@
+import depthai as dai
+
+from depthai_nodes.node import ParsingNeuralNetwork, ImgDetectionsFilter
+
+from utils.helper_functions import extract_text_embeddings, read_intrinsics
+
+from utils.arguments import initialize_argparser
+from utils.annotation_node import AnnotationNode
+from utils.measurement_node import MeasurementNode
+
+_, args = initialize_argparser()
+
+IP = args.ip or "localhost"
+PORT = args.port or 8080
+
+CLASS_NAMES = ["person", "chair", "TV"]
+MAX_NUM_CLASSES = 80
+CONFIDENCE_THRESHOLD = 0.15
+
+visualizer = dai.RemoteConnection(serveFrontend=False)
+device = dai.Device(dai.DeviceInfo(args.device)) if args.device else dai.Device()
+
+platform = device.getPlatformAsString()
+
+if platform != "RVC4":
+    raise ValueError("This example is supported only on RVC4 platform")
+
+device.setIrLaserDotProjectorIntensity(1.0)
+device.setIrFloodLightIntensity(1)
+
+frame_type = dai.ImgFrame.Type.BGR888i
+text_features = extract_text_embeddings(
+    class_names=CLASS_NAMES, max_num_classes=MAX_NUM_CLASSES
+)
+
+if args.fps_limit is None:
+    args.fps_limit = 8
+    print(
+        f"\nFPS limit set to {args.fps_limit} for {platform} platform. If you want to set a custom FPS limit, use the --fps_limit flag.\n"
+    )
+
+with dai.Pipeline(device) as pipeline:
+    print("Creating pipeline...")
+
+    model_description = dai.NNModelDescription.fromYamlFile(
+        f"yoloe_v8_l.{platform}.yaml"
+    )
+    model_description.platform = platform
+    model_nn_archive = dai.NNArchive(dai.getModelFromZoo(model_description))
+    model_w, model_h = model_nn_archive.getInputSize()
+
+    cam = pipeline.create(dai.node.Camera).build(
+        boardSocket=dai.CameraBoardSocket.CAM_A
+    )
+    cam_out = cam.requestOutput(
+        size=(640, 400), type=dai.ImgFrame.Type.RGB888i, fps=args.fps_limit
+    )
+
+    left = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_B)
+    right = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_C)
+    left_out = left.requestOutput(
+        (640, 400), type=dai.ImgFrame.Type.NV12, fps=args.fps_limit
+    )
+    right_out = right.requestOutput(
+        (640, 400), type=dai.ImgFrame.Type.NV12, fps=args.fps_limit
+    )
+
+    stereo = pipeline.create(dai.node.StereoDepth).build(
+        left=left_out,
+        right=right_out,
+        presetMode=dai.node.StereoDepth.PresetMode.DEFAULT,
+    )
+
+    imu = pipeline.create(dai.node.IMU)
+    imu.enableIMUSensor(dai.IMUSensor.ACCELEROMETER_RAW, 100)
+    imu.setBatchReportThreshold(10)
+    imu.setMaxBatchReports(10)
+
+    manip = pipeline.create(dai.node.ImageManip)
+    manip.initialConfig.setOutputSize(
+        model_w, model_h, dai.ImageManipConfig.ResizeMode.LETTERBOX
+    )
+    manip.initialConfig.setFrameType(frame_type)
+    manip.setMaxOutputFrameSize(model_w * model_h * 3)
+
+    align = pipeline.create(dai.node.ImageAlign)
+
+    stereo.depth.link(align.input)
+    cam_out.link(align.inputAlignTo)
+    cam_out.link(manip.inputImage)
+
+    input_node = manip.out
+
+    nn_with_parser = pipeline.create(ParsingNeuralNetwork)
+    nn_with_parser.setNNArchive(model_nn_archive)
+    nn_with_parser.setBackend("snpe")
+    nn_with_parser.setBackendProperties(
+        {"runtime": "dsp", "performance_profile": "default"}
+    )
+    nn_with_parser.setNumInferenceThreads(1)
+    nn_with_parser.getParser(0).setConfidenceThreshold(CONFIDENCE_THRESHOLD)
+
+    input_node.link(nn_with_parser.inputs["images"])
+
+    textInputQueue = nn_with_parser.inputs["texts"].createInputQueue()
+    nn_with_parser.inputs["texts"].setReusePreviousMessage(True)
+
+    det_process_filter = pipeline.create(ImgDetectionsFilter).build(nn_with_parser.out)
+    det_process_filter.setLabels(labels=[i for i in range(len(CLASS_NAMES))], keep=True)
+
+    # Annotation node
+    annotation_node = pipeline.create(AnnotationNode).build(
+        det_process_filter.out,
+        cam_out,
+        align.outputAligned,
+        label_encoding={k: v for k, v in enumerate(CLASS_NAMES)},
+    )
+
+    # RGBD node for the segmented PCL
+    rgbd_seg = pipeline.create(dai.node.RGBD).build()
+    annotation_node.out_segm.link(rgbd_seg.inColor)
+    annotation_node.out_segm_depth.link(rgbd_seg.inDepth)
+
+    # Measurement node
+    measurement_node = pipeline.create(MeasurementNode).build(
+        rgbd_seg.pcl, annotation_node.out_selection, imu.out
+    )
+    measurement_node.out_result.link(annotation_node.in_meas_result)
+
+    fx, fy, cx, cy = read_intrinsics(device, 640, 400)
+    measurement_node.setIntrinsics(fx, fy, cx, cy, imgW=640, imgH=400)
+    measurement_node.an_node = annotation_node
+
+    # Service functions for all functionalities of the frontend
+    def class_update_service(new_classes: list[str]):
+        """Changes classes to detect based on the user input"""
+        if len(new_classes) == 0:
+            print("List of new classes empty, skipping.")
+            return
+        if len(new_classes) > MAX_NUM_CLASSES:
+            print(
+                f"Number of new classes ({len(new_classes)}) exceeds maximum number of classes ({MAX_NUM_CLASSES}), skipping."
+            )
+            return
+        CLASS_NAMES = new_classes
+
+        text_features = extract_text_embeddings(
+            class_names=CLASS_NAMES,
+            max_num_classes=MAX_NUM_CLASSES,
+        )
+        inputNNData = dai.NNData()
+        inputNNData.addTensor(
+            "texts", text_features, dataType=dai.TensorInfo.DataType.FP16
+        )
+        textInputQueue.send(inputNNData)
+
+        det_process_filter.setLabels(
+            labels=[i for i in range(len(CLASS_NAMES))], keep=True
+        )
+        annotation_node.setLabelEncoding({k: v for k, v in enumerate(CLASS_NAMES)})
+        print(f"Classes set to: {CLASS_NAMES}")
+
+    def conf_threshold_update_service(new_conf_threshold: float):
+        """Changes confidence threshold based on the user input"""
+        CONFIDENCE_THRESHOLD = max(0, min(1, new_conf_threshold))
+        nn_with_parser.getParser(0).setConfidenceThreshold(CONFIDENCE_THRESHOLD)
+        print(f"Confidence threshold set to: {CONFIDENCE_THRESHOLD}:")
+
+    def selection_service(clicks: dict):
+        """Changes selected object based on the user click"""
+        if clicks.get("clear"):
+            annotation_node.clearSelection()
+            return {"ok": True, "cleared": True}
+        try:
+            x = float(clicks["x"])
+            y = float(clicks["y"])
+        except Exception as e:
+            return {"ok": False, "error": f"bad payload: {e}"}
+
+        annotation_node.setSelectionPoint(x, y)
+        annotation_node.setKeepTopOnly(True)
+
+        measurement_node.reset_measurements()
+        annotation_node.clearCachedMeasurements()
+        print(f"Selection point set to ({x:.3f}, {y:.3f})")
+        return {"ok": True}
+
+    def measurement_method_service(payload: dict):
+        """
+        Changes measurement method based on the user input
+        Expects: {"method": "obb"|"heightgrid"}
+        """
+        method = str(payload.get("method", "")).lower()
+        if method not in ("obb", "heightgrid"):
+            return {"ok": False, "error": f"unknown method '{method}'"}
+        measurement_node.measurement_mode = method
+        if method == "heightgrid":
+            annotation_node.requestPlaneCapture(True)
+        else:
+            annotation_node.requestPlaneCapture(False)
+        measurement_node.reset_measurements()
+        print("Selected method: ", method)
+        return {"ok": True, "method": method, "have_plane": measurement_node.have_plane}
+
+    # Connect the services in the frontend to functions in the backend
+    visualizer.registerService("Selection Service", selection_service)
+    visualizer.registerService("Class Update Service", class_update_service)
+    visualizer.registerService(
+        "Threshold Update Service", conf_threshold_update_service
+    )
+    visualizer.registerService("Measurement Method Service", measurement_method_service)
+
+    visualizer.addTopic("Video", cam_out, "images")
+    visualizer.addTopic("Detections", annotation_node.out_ann, "images")
+    visualizer.addTopic("Pointclouds", rgbd_seg.pcl, "point_clouds")
+    visualizer.addTopic("Measurement Overlay", measurement_node.out_ann, "images")
+    visualizer.addTopic("Plane Status", measurement_node.out_plane_status, "images")
+
+    print("Pipeline created.")
+
+    pipeline.start()
+    visualizer.registerPipeline(pipeline)
+
+    inputNNData = dai.NNData()
+    inputNNData.addTensor("texts", text_features, dataType=dai.TensorInfo.DataType.FP16)
+    textInputQueue.send(inputNNData)
+
+    print("Press 'q' to stop")
+
+    while pipeline.isRunning():
+        pipeline.processTasks()
+        key = visualizer.waitKey(1)
+        if key == ord("q"):
+            print("Got q key. Exiting...")
+            break
diff --git a/apps/object-volume-measurement-3d/backend/src/requirements.txt b/apps/object-volume-measurement-3d/backend/src/requirements.txt
@@ -0,0 +1,11 @@
+depthai==3.2.1
+depthai-nodes==0.3.7
+opencv-python-headless~=4.10.0
+numpy>=1.22
+tokenizers~=0.21.0
+onnxruntime
+open3d~=0.18
+scipy==1.11.4
+# onnxruntime-gpu # if you want to use CUDAExecutionProvider
+requests
+tqdm