-
Notifications
You must be signed in to change notification settings - Fork 89
feat(transformers/models): add models of eomt and timesfm #1403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @The-truthh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the EoMT model, including its modeling files and image processors. The implementation appears to be a port from the Hugging Face Transformers library. While the overall structure is good, there are several critical issues that need to be addressed. These include incorrect API usage when porting from PyTorch to MindSpore, such as using .copy_() and calling linear_sum_assignment with a tensor instead of a NumPy array, which will lead to runtime errors. I've also identified several incorrect type hints and unsafe dictionary access patterns. My review provides specific suggestions to fix these bugs and improve the code's correctness and robustness.
| masks, classes = convert_segmentation_map_to_binary_masks( | ||
| segmentation_map, | ||
| instance_id, | ||
| ignore_index=ignore_index, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convert_segmentation_map_to_binary_masks function expects a NumPy array for the segmentation_map argument, but it is being passed a MindSpore tensor. This will cause a runtime error. You need to convert the tensor to a NumPy array using .asnumpy() before passing it to the function.
| masks, classes = convert_segmentation_map_to_binary_masks( | |
| segmentation_map, | |
| instance_id, | |
| ignore_index=ignore_index, | |
| ) | |
| masks, classes = convert_segmentation_map_to_binary_masks( | |
| segmentation_map.asnumpy(), | |
| instance_id, | |
| ignore_index=ignore_index, | |
| ) |
| cost_matrix = mint.maximum(cost_matrix, ms.tensor(-1e10)) | ||
| cost_matrix = mint.nan_to_num(cost_matrix, 0) | ||
| # do the assignment using the hungarian algorithm in scipy | ||
| assigned_indices: tuple[np.array] = linear_sum_assignment(cost_matrix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scipy.optimize.linear_sum_assignment function requires a NumPy array as input, but cost_matrix is a MindSpore tensor. This will cause a runtime error. You must convert cost_matrix to a NumPy array using .asnumpy() before the function call.
| assigned_indices: tuple[np.array] = linear_sum_assignment(cost_matrix) | |
| assigned_indices: tuple[np.array] = linear_sum_assignment(cost_matrix.asnumpy()) |
| padding_masks = mint.ones((batch_size, height, width), dtype=ms.bool_) | ||
| # pad the tensors to the size of the biggest one | ||
| for tensor, padded_tensor, padding_mask in zip(tensors, padded_tensors, padding_masks): | ||
| padded_tensor[: tensor.shape[0], : tensor.shape[1], : tensor.shape[2]].copy_(tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .copy_() method is an in-place operation from PyTorch and is not available for MindSpore tensors. The equivalent operation in MindSpore can be achieved using slice assignment.
| padded_tensor[: tensor.shape[0], : tensor.shape[1], : tensor.shape[2]].copy_(tensor) | |
| padded_tensor[: tensor.shape[0], : tensor.shape[1], : tensor.shape[2]] = tensor |
| def get_target_size(size_dict: dict[str, int]) -> tuple[int, int]: | ||
| """Returns the height and width from a size dict.""" | ||
| target_height = size_dict["shortest_edge"] | ||
| target_width = size_dict["longest_edge"] or target_height |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly accessing size_dict["longest_edge"] is unsafe and will raise a KeyError if the key is not present. It's better to use the .get() method with a fallback value to handle this case gracefully, similar to the implementation in image_processing_eomt.py.
| target_width = size_dict["longest_edge"] or target_height | |
| target_width = size_dict.get("longest_edge") or target_height |
| from .image_processing_eomt import * | ||
| from .image_processing_eomt_fast import * | ||
| from .modeling_eomt import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using wildcard imports (*) is generally discouraged as it can lead to namespace pollution and makes it difficult to trace where symbols are defined. It's better to explicitly import the names that are intended to be part of the module's public API. This also aligns with the convention used in other model directories within this project.
| from .image_processing_eomt import * | |
| from .image_processing_eomt_fast import * | |
| from .modeling_eomt import * | |
| from .image_processing_eomt import EomtImageProcessor | |
| from .image_processing_eomt_fast import EomtImageProcessorFast | |
| from .modeling_eomt import EomtForUniversalSegmentation, EomtPreTrainedModel | |
| __all__ = ["EomtImageProcessor", "EomtImageProcessorFast", "EomtForUniversalSegmentation", "EomtPreTrainedModel"] |
| resample: PILImageResampling = None, | ||
| data_format: Union[str, ChannelDimension] = None, | ||
| input_data_format: Optional[Union[str, ChannelDimension]] = None, | ||
| ) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (`bool`, *optional*, defaults to `True`): | ||
| Whether or not to pad images up to the largest image in a batch and create a pixel mask. | ||
|
|
||
| If left to the default, will return a pixel mask that is: | ||
|
|
||
| - 1 for pixels that are real (i.e. **not masked**), | ||
| - 0 for pixels that are padding (i.e. **masked**). | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def __init__(self, **kwargs: Unpack[EomtImageProcessorFastKwargs]): | ||
| super().__init__(**kwargs) | ||
|
|
||
| def _split_image(self, images: ms.Tensor, size: dict, image_indices: int) -> tuple[list, list]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint for image_indices is int, but it is used as a sequence on line 133 (image_indices[batch_idx]). The type hint should be corrected to list[int] or a more general sequence type to reflect its actual usage.
| def _split_image(self, images: ms.Tensor, size: dict, image_indices: int) -> tuple[list, list]: | |
| def _split_image(self, images: ms.Tensor, size: dict, image_indices: list[int]) -> tuple[list, list]: |
| self, | ||
| outputs, | ||
| target_sizes: list[tuple[int, int]], | ||
| threshold: float = 0.8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default threshold of 0.8 for instance segmentation is unusually high and likely a copy-paste error from the panoptic segmentation function. A more standard default value for instance segmentation is 0.5. Using a high threshold might lead to valid instances being filtered out.
| threshold: float = 0.8, | |
| threshold: float = 0.5, |
|
|
||
| self.layernorm2d = EomtLayerNorm2d(hidden_size) | ||
|
|
||
| def construct(self, hidden_states: ms.tensor) -> ms.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a2c5348 to
24b9b17
Compare
24b9b17 to
a5b9b17
Compare
What does this PR do?
Add
Limitation
FP16 is not supported in timesfm due to the occurrence of NaN values in its outputs.
Usage
Performance
Before submitting
What's New. Here are thedocumentation guidelines
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@xxx