-
Notifications
You must be signed in to change notification settings - Fork 135
AWQ Modifier #1177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWQ Modifier #1177
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
9273ef3
to
28f8bca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add evals comparing to GPTQ?
8120fe5
to
d76ba6d
Compare
Using the latest commit at this time, I am getting the following results via lm-eval.
|
9168743
to
21fc931
Compare
Comparing AWQ vs. GPTQ vs. RTN for
|
21fc931
to
03f7546
Compare
Signed-off-by: Brian Dellabetta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove/fix the example, otherwise LGTM
4599d39
to
4b3325c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fix quality
4b3325c
to
dd163b0
Compare
Signed-off-by: Brian Dellabetta <[email protected]>
dd163b0
to
d1d3766
Compare
This PR updates the main README.md to introduce a "New Features" section, improving visibility for recent major additions to LLM Compressor. This section highlights: - Axolotl Sparse Finetuning Integration (https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor) - AutoAWQ Integration for low-bit weight quantization (#1177) - Day 0 Llama 4 support and its use by Meta This helps users quickly understand the latest capabilities of the library. --------- Signed-off-by: Rahul Tuli <[email protected]>
This PR updates the main README.md to introduce a "New Features" section, improving visibility for recent major additions to LLM Compressor. This section highlights: - Axolotl Sparse Finetuning Integration (https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor) - AutoAWQ Integration for low-bit weight quantization (#1177) - Day 0 Llama 4 support and its use by Meta This helps users quickly understand the latest capabilities of the library. --------- Signed-off-by: Rahul Tuli <[email protected]>
SUMMARY: - Add QuantizationMixin to AWQModifier so we don't have redundant inputs (num_bits, symmetric, group_size) - Move AWQModifier to sequential pipelines, to avoid huge memory requirements of caching all activations at once. Regression test results are acceptable, results are all roughly the same, and within stderr, see test plan below. Resolves #1409 Resolves #1369 Related to #1383 Related to #1406 Related to #1368 Related to #1410 More improvements split into #1435 TEST PLAN: - [x] Rerun tests to validate No regression in tests, comparing against those reported in [original AWQ PR](#1177 (comment)). All gsm8k results are within stderr: | Type | gsm8k | wikitext | ------ | ------ | ----- | Old AWQ+QuantModifier Sym | .1054, .1069 | 9.1931 | New AWQ+QuantMixin Sym | .1077, .1084 | 9.1841 | Old AWQ+QuantModifier Asym | .1274, .1281 | 9.0281 | New AWQ+QuantMixin Asym | .1312, .1350 | 9.0288 --------- Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Kyle Sayers <[email protected]>
awq支持qwen2.5 7b模型的量化吗,我使用main分支的代码进行量化,报了这样的错 |
@Jim2016713 if you'd like to report a bug or issue please create a ticket with the corresponding details the form requests -- env versions, code snippet, full stack trace. |
@Jim2016713 the error msg means you should padding to the max length in tokenize function, i meet the same problem and solve like this: ```tokenizer(..., max_length=some_number, padding="max_length", truncation=True) |
SUMMARY:
Addition of
AWQModifier
, based on AutoAWQ implementation.Should be reviewed/merged in conjunction with neuralmagic/compressed-tensors#269
Replaces #181 and #824
TEST PLAN:
Some unit tests included, but as this was mostly a port from AutoAWQ, we validated the code by ensuring we could reproduce the evaluation metrics in Table 4 of the paper. We achieve the following wikitext PPL scores:
Llama-2 7B Group 128:
Llama-2 13B Group 128:
NOTE: We are excluding the clipping logic in this implementation, if we want to add it we should add it as another modifier, they are mutually exclusive and the data model for AWQ doesn't align well with clipping. That might be the reason for the slight deviation of results reported in the paper and in our implementation