instance segmentation [yolo11 and gdsam2] #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

MultyXu wants to merge 9 commits into main from feature/hydra_object

MultyXu commented Nov 5, 2025

Adding instance segmentation functionality in response to this PR.

MultyXu added 3 commits

October 30, 2025 21:51


          Initial version of yolov11

5e0ae5b


          close set instance segmentation with python environment

27bc190


          pre-commit fix

842e48e

nathanhhughes reviewed

View reviewed changes

Collaborator

nathanhhughes left a comment

Some cleanup that we should think about doing, but overall makes sense / looks good. The most critical things to address are where code lives and how the launch files work, the stylistic comments aren't super important

semantic_inference/python/semantic_inference/models/instance_segmenter.py Outdated Show resolved Hide resolved

semantic_inference/python/semantic_inference/models/instance_segmenter.py

    
                  """Main config for instance segmenter."""

                  instance_model: Any = config_field("instance_model", default="yolov11")

                  # relevant configs (model path, model weights) for the model

Collaborator

nathanhhughes Nov 5, 2025

(minor) not needed at the moment?

Collaborator

nathanhhughes Nov 5, 2025

Also, looking at the yolov11 wrapper, I'd consider adding a minimum confidence score for the detected objects, I'd assume most instance segmenters return some sort of 0-1 confidence score per mask

semantic_inference/python/semantic_inference/models/instance_segmenter.py Outdated Show resolved Hide resolved

semantic_inference/python/semantic_inference/models/wrappers.py Outdated Show resolved Hide resolved

semantic_inference/python/semantic_inference/models/wrappers.py Show resolved Hide resolved

semantic_inference_ros/app/instance_segmentation_node

Comment on lines +60 to +63

    
                      )  # publish segmented image

                      self._worker = semantic_inference_ros.ImageWorker(

                          self, self.config.worker, "color/image_raw", self._spin_once

                      )  # put image in queue for processing

Collaborator

nathanhhughes Nov 5, 2025

(minor) The comments at the end are causing line breaks that I'd rather avoid (though the image worker init probably would have the line break anyway). I'd probably just drop the comments, but if there's helpful context you'd want to convey, you could write a block comment above the initialization of the publish and the image worker

semantic_inference_ros/app/instance_segmentation_node

Comment on lines +65 to +67

    
                      self._visualizer = "place_holder"

                      # self._visualizer = self.config.visualizer.create()

                      if self._visualizer is not None:

Collaborator

nathanhhughes Nov 5, 2025

It's probably better to use a boolean flag in the config to toggle the visualization on and off

semantic_inference_ros/app/instance_segmentation_node Outdated

Comment on lines 99 to 116

    
                  def convert_to_instance_seg_img(self, ret):

                      """

                      Convert segmentation results to instance segmentation image.

                      Each pixel value encodes both category id and instance id.

                      First 16 bits are category id, last 16 bits are instance id.

                      """

                      masks = ret.masks.cpu().numpy()

                      category_ids = ret.categories.cpu().numpy()

                      img = np.zeros(masks[0].shape, dtype=np.uint32)

                      for i in range(masks.shape[0]):

                          category_id = int(category_ids[i])  # category id are 0-indexed

                          instance_id = i + 1  # instance ids are 1-indexed

                          combined_id = (

                              category_id << 16

                          ) | instance_id  # combine into single uint32

                          img[masks[i, ...] > 0] = combined_id

                      return img

Collaborator

nathanhhughes Nov 5, 2025

Couple things:

I'd move this to the non-ros side (and maybe write a quick unit test just to double check the implementation)
Where the comments are are important, my preference is usually a block comment before the actual code so that the formatting is clearer
Category IDs being 0-indexed make it a little tricky on the Hydra / Khronos side
Technically this can be a static method

Author

MultyXu Dec 3, 2025

In that sense, do we want to switch to 1 index? And leave 0 as "unknown"?
I moved this function inside the result class as a property.

Collaborator

nathanhhughes Dec 3, 2025

Let's actually keep it as 0-indexed for now, I think the instance IDs being 1-indexed make this okay (this is a pretty easy change to make later if we decide differently)

semantic_inference_ros/app/instance_segmentation_node Outdated Show resolved Hide resolved

semantic_inference_ros/app/instance_segmentation_node

Comment on lines 126 to 215

    
                  def visualizer_call(self, ret, img):

                      """

                      Process the result from yolo instance segmenter and generate color image.

                      The returned color image contain bounding boxes, masks, and category labels.

                      """

                      categories = ret.categories

                      masks = ret.masks

                      boxes = ret.boxes

                      confidences = ret.confidences

                      # TODO: place holder directly from model, need to be replace by proper yaml

                      category_names = self._model.segmenter.model.names

                      # Convert RGB to BGR for OpenCV

                      vis_img_bgr = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)

                      # Generate random colors for each class

                      np.random.seed(42)  # for consistent colors

                      colors = np.random.randint(

                          0, 255, size=(len(category_names), 3), dtype=np.uint8

                      )

                      # Overlay segmentation masks

                      if masks is not None:

                          for i, mask_tensor in enumerate(masks.data):

                              box = boxes[i]

                              cls = int(categories[i].cpu().numpy())

                              # Get color for the class

                              color = colors[cls].tolist()

                              # Get mask and resize it to the image dimensions

                              mask_np = mask_tensor.cpu().numpy().astype(np.uint8)

                              mask_resized = cv2.resize(

                                  mask_np,

                                  (vis_img_bgr.shape[1], vis_img_bgr.shape[0]),

                                  interpolation=cv2.INTER_NEAREST,

                              )

                              # Find contours to create a mask overlay

                              contours, _ = cv2.findContours(

                                  mask_resized, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE

                              )

                              # Create a transparent overlay

                              overlay = vis_img_bgr.copy()

                              cv2.drawContours(overlay, contours, -1, color, -1)

                              # Blend the overlay with the original image

                              alpha = 0.5

                              vis_img_bgr = cv2.addWeighted(overlay, alpha, vis_img_bgr, 1 - alpha, 0)

                      # Draw bounding boxes and labels

                      for i, box in enumerate(boxes):

                          x1, y1, x2, y2 = map(int, box.cpu().numpy())

                          conf = confidences[i].cpu().numpy()

                          cls = int(categories[i].cpu().numpy())

                          label = f"{category_names[cls]} {conf:.2f}"

                          color = colors[cls].tolist()

                          # Draw bounding box

                          cv2.rectangle(vis_img_bgr, (x1, y1), (x2, y2), color, 2)

                          # Put label above the bounding box

                          (label_width, label_height), baseline = cv2.getTextSize(

                              label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2

                          )

                          cv2.rectangle(

                              vis_img_bgr,

                              (x1, y1 - label_height - 10),

                              (x1 + label_width, y1),

                              color,

                              -1,

                          )

                          cv2.putText(

                              vis_img_bgr,

                              label,

                              (x1, y1 - 5),

                              cv2.FONT_HERSHEY_SIMPLEX,

                              0.5,

                              (255, 255, 255),

                              1,

                          )

                      # Convert BGR back to RGB for displaying with matplotlib

                      vis_img_rgb = cv2.cvtColor(vis_img_bgr, cv2.COLOR_BGR2RGB)

                      return vis_img_rgb

Collaborator

nathanhhughes Nov 5, 2025

It'd be great to factor this out to the non-ros side / make a standard interface to get the model category names (we can talk about how to do this at some point)

Author

MultyXu Dec 4, 2025

Should I follow

semantic_inference/semantic_inference_ros/app/open_set_node

Line 63 in d8ca2e6

self._visualizer = self.config.visualizer.create()

and create a visualizer class?

Collaborator

nathanhhughes Dec 4, 2025

I think it's probably fine as a standalone function for now, but there's not really a downside to it being a class. My comment was more that it might be useful to be able to call this on the non-ros side (e.g., having a notebook that visualizes the results of various models on a test image) without requiring the full ros node

The reason for the virutal config on the open-set side is that there are multiple valid visualization options you might want to choose between at runtime (coloring by segment, coloring by nearest text embedding, coloring by embedding values) but the visualization you have is probably the only one I would use for closed-set instances

MultyXu added 4 commits

November 13, 2025 00:14


          Add grounded sam 2 as an instance segmenter. Fix yaml to use environm…

8a8610f

…ent variable for home instead of hardcoding it.


          Add inital document, modify label space

3eec3e5


          mini change before merge hungytae's commit

fd19d33


          Fix wrong comment line. Add more info to instance seg setup doc

a2cc0d5

MultyXu changed the title ~~yolo close set instance segmentation~~ instance segmentation [yolo11 and gdsam2]

MultyXu added 2 commits

December 3, 2025 13:31


          clean up the code by following the comments on the PR

c714f7d


          clean with pre-commit

340247d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet