-
Notifications
You must be signed in to change notification settings - Fork 8
instance segmentation [yolo11 and gdsam2] #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
nathanhhughes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some cleanup that we should think about doing, but overall makes sense / looks good. The most critical things to address are where code lives and how the launch files work, the stylistic comments aren't super important
semantic_inference/python/semantic_inference/models/instance_segmenter.py
Outdated
Show resolved
Hide resolved
| """Main config for instance segmenter.""" | ||
|
|
||
| instance_model: Any = config_field("instance_model", default="yolov11") | ||
| # relevant configs (model path, model weights) for the model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) not needed at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, looking at the yolov11 wrapper, I'd consider adding a minimum confidence score for the detected objects, I'd assume most instance segmenters return some sort of 0-1 confidence score per mask
semantic_inference/python/semantic_inference/models/instance_segmenter.py
Outdated
Show resolved
Hide resolved
semantic_inference/python/semantic_inference/models/wrappers.py
Outdated
Show resolved
Hide resolved
| ) # publish segmented image | ||
| self._worker = semantic_inference_ros.ImageWorker( | ||
| self, self.config.worker, "color/image_raw", self._spin_once | ||
| ) # put image in queue for processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) The comments at the end are causing line breaks that I'd rather avoid (though the image worker init probably would have the line break anyway). I'd probably just drop the comments, but if there's helpful context you'd want to convey, you could write a block comment above the initialization of the publish and the image worker
| self._visualizer = "place_holder" | ||
| # self._visualizer = self.config.visualizer.create() | ||
| if self._visualizer is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better to use a boolean flag in the config to toggle the visualization on and off
| def convert_to_instance_seg_img(self, ret): | ||
| """ | ||
| Convert segmentation results to instance segmentation image. | ||
| Each pixel value encodes both category id and instance id. | ||
| First 16 bits are category id, last 16 bits are instance id. | ||
| """ | ||
| masks = ret.masks.cpu().numpy() | ||
| category_ids = ret.categories.cpu().numpy() | ||
| img = np.zeros(masks[0].shape, dtype=np.uint32) | ||
| for i in range(masks.shape[0]): | ||
| category_id = int(category_ids[i]) # category id are 0-indexed | ||
| instance_id = i + 1 # instance ids are 1-indexed | ||
| combined_id = ( | ||
| category_id << 16 | ||
| ) | instance_id # combine into single uint32 | ||
| img[masks[i, ...] > 0] = combined_id | ||
|
|
||
| return img |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple things:
- I'd move this to the non-ros side (and maybe write a quick unit test just to double check the implementation)
- Where the comments are are important, my preference is usually a block comment before the actual code so that the formatting is clearer
- Category IDs being 0-indexed make it a little tricky on the Hydra / Khronos side
- Technically this can be a static method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- In that sense, do we want to switch to 1 index? And leave 0 as "unknown"?
- I moved this function inside the result class as a
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's actually keep it as 0-indexed for now, I think the instance IDs being 1-indexed make this okay (this is a pretty easy change to make later if we decide differently)
| def visualizer_call(self, ret, img): | ||
| """ | ||
| Process the result from yolo instance segmenter and generate color image. | ||
| The returned color image contain bounding boxes, masks, and category labels. | ||
| """ | ||
|
|
||
| categories = ret.categories | ||
| masks = ret.masks | ||
| boxes = ret.boxes | ||
| confidences = ret.confidences | ||
|
|
||
| # TODO: place holder directly from model, need to be replace by proper yaml | ||
| category_names = self._model.segmenter.model.names | ||
|
|
||
| # Convert RGB to BGR for OpenCV | ||
| vis_img_bgr = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) | ||
|
|
||
| # Generate random colors for each class | ||
| np.random.seed(42) # for consistent colors | ||
| colors = np.random.randint( | ||
| 0, 255, size=(len(category_names), 3), dtype=np.uint8 | ||
| ) | ||
|
|
||
| # Overlay segmentation masks | ||
| if masks is not None: | ||
| for i, mask_tensor in enumerate(masks.data): | ||
| box = boxes[i] | ||
| cls = int(categories[i].cpu().numpy()) | ||
|
|
||
| # Get color for the class | ||
| color = colors[cls].tolist() | ||
|
|
||
| # Get mask and resize it to the image dimensions | ||
| mask_np = mask_tensor.cpu().numpy().astype(np.uint8) | ||
| mask_resized = cv2.resize( | ||
| mask_np, | ||
| (vis_img_bgr.shape[1], vis_img_bgr.shape[0]), | ||
| interpolation=cv2.INTER_NEAREST, | ||
| ) | ||
|
|
||
| # Find contours to create a mask overlay | ||
| contours, _ = cv2.findContours( | ||
| mask_resized, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE | ||
| ) | ||
|
|
||
| # Create a transparent overlay | ||
| overlay = vis_img_bgr.copy() | ||
| cv2.drawContours(overlay, contours, -1, color, -1) | ||
|
|
||
| # Blend the overlay with the original image | ||
| alpha = 0.5 | ||
| vis_img_bgr = cv2.addWeighted(overlay, alpha, vis_img_bgr, 1 - alpha, 0) | ||
|
|
||
| # Draw bounding boxes and labels | ||
| for i, box in enumerate(boxes): | ||
| x1, y1, x2, y2 = map(int, box.cpu().numpy()) | ||
| conf = confidences[i].cpu().numpy() | ||
| cls = int(categories[i].cpu().numpy()) | ||
| label = f"{category_names[cls]} {conf:.2f}" | ||
|
|
||
| color = colors[cls].tolist() | ||
|
|
||
| # Draw bounding box | ||
| cv2.rectangle(vis_img_bgr, (x1, y1), (x2, y2), color, 2) | ||
|
|
||
| # Put label above the bounding box | ||
| (label_width, label_height), baseline = cv2.getTextSize( | ||
| label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2 | ||
| ) | ||
| cv2.rectangle( | ||
| vis_img_bgr, | ||
| (x1, y1 - label_height - 10), | ||
| (x1 + label_width, y1), | ||
| color, | ||
| -1, | ||
| ) | ||
| cv2.putText( | ||
| vis_img_bgr, | ||
| label, | ||
| (x1, y1 - 5), | ||
| cv2.FONT_HERSHEY_SIMPLEX, | ||
| 0.5, | ||
| (255, 255, 255), | ||
| 1, | ||
| ) | ||
|
|
||
| # Convert BGR back to RGB for displaying with matplotlib | ||
| vis_img_rgb = cv2.cvtColor(vis_img_bgr, cv2.COLOR_BGR2RGB) | ||
|
|
||
| return vis_img_rgb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to factor this out to the non-ros side / make a standard interface to get the model category names (we can talk about how to do this at some point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I follow
| self._visualizer = self.config.visualizer.create() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably fine as a standalone function for now, but there's not really a downside to it being a class. My comment was more that it might be useful to be able to call this on the non-ros side (e.g., having a notebook that visualizes the results of various models on a test image) without requiring the full ros node
The reason for the virutal config on the open-set side is that there are multiple valid visualization options you might want to choose between at runtime (coloring by segment, coloring by nearest text embedding, coloring by embedding values) but the visualization you have is probably the only one I would use for closed-set instances
Adding instance segmentation functionality in response to this PR.