-
Notifications
You must be signed in to change notification settings - Fork 89
feats(transformers):add bitnet model #1416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @iugoood, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the BitNet model into the mindone/transformers framework. It provides the full implementation of the BitNet architecture, including its core components like RMSNorm, MLP, Attention, and Rotary Embedding, adapted for MindSpore. The integration also extends to the auto-configuration and auto-modeling systems, making the BitNet model readily available for use within the library. Additionally, a robust test suite has been added to validate the MindSpore implementation against its PyTorch counterpart. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds the BitNet model. The implementation looks solid, but there are a few areas for improvement. The model currently depends on an un-vendored BitNetConfig from the transformers library, which should be included for self-containment. The tests are incomplete, lacking coverage for BitNetForCausalLM and support for MindSpore's Graph mode, which is crucial for performance. Additionally, there are some minor documentation issues and a wildcard import that should be addressed for better code quality and maintainability.
| from ...processing_utils import Unpack | ||
| from ...utils import TransformersKwargs, can_return_tuple | ||
| from ...utils.generic import check_model_inputs | ||
| from transformers.models.bitnet.configuration_bitnet import BitNetConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model configuration BitNetConfig is imported from the transformers library, which creates an external dependency. To ensure this model is self-contained within the mindone library, please vendor the configuration_bitnet.py file into this pull request, similar to how other models are structured in this repository.
| from tests.transformers_tests.models.modeling_common import floats_numpy, ids_numpy | ||
|
|
||
| DTYPE_AND_THRESHOLDS = {"fp32": 5e-4, "fp16": 5e-3, "bf16": 5e-2} | ||
| MODES = [1] # not support graph mode yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| { | ||
| "last_hidden_state": 0, | ||
| }, | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from .modeling_bitnet import * No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2e61bfd to
612cc3f
Compare
612cc3f to
660032e
Compare
Add
1 add bitnet model
2 add UT
ps: Quantitative weights cannot be validated.
Usage
Performance
Experiments are tested on Ascend Atlas 800T A2 machines with mindspore 2.6.0.