Skip to content

Conversation

@MultyXu
Copy link

@MultyXu MultyXu commented Nov 5, 2025

Adding instance segmentation functionality in response to this PR.

Copy link
Collaborator

@nathanhhughes nathanhhughes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cleanup that we should think about doing, but overall makes sense / looks good. The most critical things to address are where code lives and how the launch files work, the stylistic comments aren't super important

"""Main config for instance segmenter."""

instance_model: Any = config_field("instance_model", default="yolov11")
# relevant configs (model path, model weights) for the model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) not needed at the moment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, looking at the yolov11 wrapper, I'd consider adding a minimum confidence score for the detected objects, I'd assume most instance segmenters return some sort of 0-1 confidence score per mask

Comment on lines +60 to +63
) # publish segmented image
self._worker = semantic_inference_ros.ImageWorker(
self, self.config.worker, "color/image_raw", self._spin_once
) # put image in queue for processing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) The comments at the end are causing line breaks that I'd rather avoid (though the image worker init probably would have the line break anyway). I'd probably just drop the comments, but if there's helpful context you'd want to convey, you could write a block comment above the initialization of the publish and the image worker

Comment on lines +65 to +67
self._visualizer = "place_holder"
# self._visualizer = self.config.visualizer.create()
if self._visualizer is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better to use a boolean flag in the config to toggle the visualization on and off

Comment on lines 99 to 116
def convert_to_instance_seg_img(self, ret):
"""
Convert segmentation results to instance segmentation image.
Each pixel value encodes both category id and instance id.
First 16 bits are category id, last 16 bits are instance id.
"""
masks = ret.masks.cpu().numpy()
category_ids = ret.categories.cpu().numpy()
img = np.zeros(masks[0].shape, dtype=np.uint32)
for i in range(masks.shape[0]):
category_id = int(category_ids[i]) # category id are 0-indexed
instance_id = i + 1 # instance ids are 1-indexed
combined_id = (
category_id << 16
) | instance_id # combine into single uint32
img[masks[i, ...] > 0] = combined_id

return img
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple things:

  • I'd move this to the non-ros side (and maybe write a quick unit test just to double check the implementation)
  • Where the comments are are important, my preference is usually a block comment before the actual code so that the formatting is clearer
  • Category IDs being 0-indexed make it a little tricky on the Hydra / Khronos side
  • Technically this can be a static method

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • In that sense, do we want to switch to 1 index? And leave 0 as "unknown"?
  • I moved this function inside the result class as a property.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's actually keep it as 0-indexed for now, I think the instance IDs being 1-indexed make this okay (this is a pretty easy change to make later if we decide differently)

Comment on lines 126 to 215
def visualizer_call(self, ret, img):
"""
Process the result from yolo instance segmenter and generate color image.
The returned color image contain bounding boxes, masks, and category labels.
"""

categories = ret.categories
masks = ret.masks
boxes = ret.boxes
confidences = ret.confidences

# TODO: place holder directly from model, need to be replace by proper yaml
category_names = self._model.segmenter.model.names

# Convert RGB to BGR for OpenCV
vis_img_bgr = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)

# Generate random colors for each class
np.random.seed(42) # for consistent colors
colors = np.random.randint(
0, 255, size=(len(category_names), 3), dtype=np.uint8
)

# Overlay segmentation masks
if masks is not None:
for i, mask_tensor in enumerate(masks.data):
box = boxes[i]
cls = int(categories[i].cpu().numpy())

# Get color for the class
color = colors[cls].tolist()

# Get mask and resize it to the image dimensions
mask_np = mask_tensor.cpu().numpy().astype(np.uint8)
mask_resized = cv2.resize(
mask_np,
(vis_img_bgr.shape[1], vis_img_bgr.shape[0]),
interpolation=cv2.INTER_NEAREST,
)

# Find contours to create a mask overlay
contours, _ = cv2.findContours(
mask_resized, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)

# Create a transparent overlay
overlay = vis_img_bgr.copy()
cv2.drawContours(overlay, contours, -1, color, -1)

# Blend the overlay with the original image
alpha = 0.5
vis_img_bgr = cv2.addWeighted(overlay, alpha, vis_img_bgr, 1 - alpha, 0)

# Draw bounding boxes and labels
for i, box in enumerate(boxes):
x1, y1, x2, y2 = map(int, box.cpu().numpy())
conf = confidences[i].cpu().numpy()
cls = int(categories[i].cpu().numpy())
label = f"{category_names[cls]} {conf:.2f}"

color = colors[cls].tolist()

# Draw bounding box
cv2.rectangle(vis_img_bgr, (x1, y1), (x2, y2), color, 2)

# Put label above the bounding box
(label_width, label_height), baseline = cv2.getTextSize(
label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2
)
cv2.rectangle(
vis_img_bgr,
(x1, y1 - label_height - 10),
(x1 + label_width, y1),
color,
-1,
)
cv2.putText(
vis_img_bgr,
label,
(x1, y1 - 5),
cv2.FONT_HERSHEY_SIMPLEX,
0.5,
(255, 255, 255),
1,
)

# Convert BGR back to RGB for displaying with matplotlib
vis_img_rgb = cv2.cvtColor(vis_img_bgr, cv2.COLOR_BGR2RGB)

return vis_img_rgb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to factor this out to the non-ros side / make a standard interface to get the model category names (we can talk about how to do this at some point)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I follow

self._visualizer = self.config.visualizer.create()
and create a visualizer class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably fine as a standalone function for now, but there's not really a downside to it being a class. My comment was more that it might be useful to be able to call this on the non-ros side (e.g., having a notebook that visualizes the results of various models on a test image) without requiring the full ros node

The reason for the virutal config on the open-set side is that there are multiple valid visualization options you might want to choose between at runtime (coloring by segment, coloring by nearest text embedding, coloring by embedding values) but the visualization you have is probably the only one I would use for closed-set instances

@MultyXu MultyXu changed the title yolo close set instance segmentation instance segmentation [yolo11 and gdsam2] Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants