-
Notifications
You must be signed in to change notification settings - Fork 89
feat(transformers): add qwen3-omni model #1411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…r/qwen2vl image processor fast/qwen3omni processor/Qwen3OmniMoeAudioEncoder
Summary of ChangesHello @wcrzlh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces the Qwen3-Omni-MoE model, a large multi-modal model. The changes are extensive, touching not only the model implementation and examples but also core components of the mindone.transformers and mindone.diffusers libraries. While the feature addition is significant, the PR contains several critical issues that must be addressed before merging. These include TODOs left in the code, commented-out functionality causing regressions, incorrect API usage that will lead to runtime errors, and significant performance bottlenecks due to workarounds for framework limitations. The example README.md also contains incorrect installation instructions. I have provided detailed comments on these critical and high-severity issues.
| # TODO check difference of "ms.tensor.scatter" and "ms.tensor.scatter_" | ||
| scores_processed = scores.scatter_(1, input_ids, score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code was changed to use the in-place scatter_ method, but a TODO comment was added, indicating uncertainty about the difference. Using an in-place operation can introduce side effects by modifying the input scores tensor, which might be unexpected by the caller. Please confirm if this in-place modification is intended and safe. If not, the original out-of-place scatter operation should be used. The TODO comment must be removed before merging.
| # TODO check difference of "ms.tensor.scatter" and "ms.tensor.scatter_" | |
| scores_processed = scores.scatter_(1, input_ids, score) | |
| scores_processed = scores.scatter(1, input_ids, score) |
|
|
||
| ``` | ||
|
|
||
| Text generation Outputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about the audio output? maybe we can attach the audio output as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The audio output quality is good. It retells the text output and summarizes the audio.
Let me figure out how to attach audio output.
| causal_mask[i, :, :, j] = mask_function(batch_size[i], head_dim, cache_postion, kv_range[j].item()) | ||
| else: | ||
| causal_mask = mint.zeros((q_len, kv_len), dtype=ms.bool_) | ||
| for i in range(kv_len): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is the efficiency of _vmap_patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only consideration "k" axis for non-vectorization, the performance is similar to "mindspore.vmap".
But sometimes padding_mask_func would be used as part of mask_func. If considering batch size axis for non-vectorization, the performance is slower than "mindspore.vmap". Right now I am thinking about better substitution for "mindspore.vmap".
What does this PR do?
Fixes # (issue)
Adds # (feature)
Before submitting
What's New. Here are thedocumentation guidelines
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@xxx