Skip to content

AWQ Modifier #1177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Apr 21, 2025
Merged

AWQ Modifier #1177

merged 41 commits into from
Apr 21, 2025

Conversation

brian-dellabetta
Copy link
Collaborator

@brian-dellabetta brian-dellabetta commented Feb 19, 2025

SUMMARY:
Addition of AWQModifier, based on AutoAWQ implementation.

Should be reviewed/merged in conjunction with neuralmagic/compressed-tensors#269

Replaces #181 and #824

TEST PLAN:
Some unit tests included, but as this was mostly a port from AutoAWQ, we validated the code by ensuring we could reproduce the evaluation metrics in Table 4 of the paper. We achieve the following wikitext PPL scores:

Llama-2 7B Group 128:

  1. Paper: 5.60
  2. AutoAWQ: 5.615
  3. This implementation: 5.612
  4. we match what the paper reports for just RTN -- 5.73
  5. We get reasonable results for channel-wise -- 6.788. AutoAWQ errors out for this (setting "q_group_size": -1 in the quant_config), and results not reported in paper.

Llama-2 13B Group 128:

  1. We match the results of AutoAWQ and the results shown in the paper: 4.97
  2. We match what the paper reports for just RTN -- 4.984

NOTE: We are excluding the clipping logic in this implementation, if we want to add it we should add it as another modifier, they are mutually exclusive and the data model for AWQ doesn't align well with clipping. That might be the reason for the slight deviation of results reported in the paper and in our implementation

Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch 2 times, most recently from 9273ef3 to 28f8bca Compare February 20, 2025 17:27
@brian-dellabetta brian-dellabetta changed the title Bdellabe/awq modifier v3 Bdellabe/Rtuli awq modifier v3 Mar 10, 2025
@brian-dellabetta brian-dellabetta marked this pull request as ready for review March 10, 2025 21:45
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add evals comparing to GPTQ?

@brian-dellabetta
Copy link
Collaborator Author

Using the latest commit at this time, I am getting the following results via lm-eval.

deepseek-ai/DeepSeek-R1-Distill-Llama-8B:
 dense:
   #gsm flexible-extract, strict-match
   gsm8k: .6619, .6490
   wititext ppl: 15.4498
 awq+quant sym:
   gsm8k: .6376, .6217
   wititext ppl: 18.8623
 quant sym:
   gsm8k: .6732, .6543
   wititext ppl: 16.7398
meta-llama/Llama-2-7b-hf:
 dense:
   gsm8k: .1342, .1342
   wititext ppl: 8.7587
 awq+quant sym:
   gsm8k: .1024, .1001
   wititext ppl: 9.194
 quant sym:
   gsm8k: .1183, .1152
   wititext ppl: 9.311

@dsikka dsikka changed the title Bdellabe/Rtuli awq modifier v3 AWQ Modifier Support Mar 25, 2025
@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch 2 times, most recently from 9168743 to 21fc931 Compare April 1, 2025 20:23
@brian-dellabetta
Copy link
Collaborator Author

brian-dellabetta commented Apr 2, 2025

Comparing AWQ vs. GPTQ vs. RTN for meta-llama/Llama-2-7b-hf, using example script:

Type gsm8k wikitext
FP16 .1395, .1387 8.7521
AWQ ASYM .1281, .1274 9.0281
GPTQ ASYM .1312, .1296 9.1954
AWQ+GPTQ ASYM .1251, .1221 9.1449
RTN ASYM .1198, .1190 9.2098
AWQ SYM .1069, .1054 9.1931
GPTQ SYM .1046, .1039 9.3525
AWQ+GPTQ SYM .0955, .0925 9.4326
RTN SYM .1183, .1152 9.3114

@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from 21fc931 to 03f7546 Compare April 2, 2025 20:20
@brian-dellabetta brian-dellabetta added the ready When a PR is ready for review label Apr 3, 2025
Signed-off-by: Brian Dellabetta <[email protected]>
Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove/fix the example, otherwise LGTM

dsikka
dsikka previously approved these changes Apr 18, 2025
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix quality

@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from 4b3325c to dd163b0 Compare April 18, 2025 19:31
Signed-off-by: Brian Dellabetta <[email protected]>
@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from dd163b0 to d1d3766 Compare April 18, 2025 19:40
@dsikka dsikka merged commit 549b42a into main Apr 21, 2025
8 checks passed
@dsikka dsikka deleted the bdellabe/awq-modifier-v3 branch April 21, 2025 14:50
rahul-tuli added a commit that referenced this pull request May 2, 2025
This PR updates the main README.md to introduce a "New Features"
section, improving visibility for recent major additions to LLM
Compressor.

This section highlights:

- Axolotl Sparse Finetuning Integration
(https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor)
- AutoAWQ Integration for low-bit weight quantization (#1177)
- Day 0 Llama 4 support and its use by Meta
This helps users quickly understand the latest capabilities of the
library.

---------

Signed-off-by: Rahul Tuli <[email protected]>
kylesayrs pushed a commit that referenced this pull request May 4, 2025
This PR updates the main README.md to introduce a "New Features"
section, improving visibility for recent major additions to LLM
Compressor.

This section highlights:

- Axolotl Sparse Finetuning Integration
(https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor)
- AutoAWQ Integration for low-bit weight quantization (#1177)
- Day 0 Llama 4 support and its use by Meta
This helps users quickly understand the latest capabilities of the
library.

---------

Signed-off-by: Rahul Tuli <[email protected]>
brian-dellabetta added a commit that referenced this pull request May 15, 2025
SUMMARY:
- Add QuantizationMixin to AWQModifier so we don't have redundant inputs
(num_bits, symmetric, group_size)
- Move AWQModifier to sequential pipelines, to avoid huge memory
requirements of caching all activations at once.

Regression test results are acceptable, results are all roughly the
same, and within stderr, see test plan below.

Resolves #1409 
Resolves #1369 
Related to #1383
Related to #1406 
Related to #1368 
Related to #1410 

More improvements split into #1435

TEST PLAN:
- [x] Rerun tests to validate
No regression in tests, comparing against those reported in [original
AWQ
PR](#1177 (comment)).
All gsm8k results are within stderr:

| Type            | gsm8k       | wikitext
| ------          | ------      | ----- 
| Old AWQ+QuantModifier Sym          | .1054, .1069     | 9.1931 
| New AWQ+QuantMixin Sym        | .1077, .1084 | 9.1841
| Old AWQ+QuantModifier Asym             | .1274, .1281 | 9.0281
| New AWQ+QuantMixin Asym        | .1312, .1350 | 9.0288

---------

Signed-off-by: Brian Dellabetta <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
@Jim2016713
Copy link

Jim2016713 commented May 16, 2025

SUMMARY: Addition of AWQModifier, based on AutoAWQ implementation.

Should be reviewed/merged in conjunction with neuralmagic/compressed-tensors#269

Replaces #181 and #824

TEST PLAN: Some unit tests included, but as this was mostly a port from AutoAWQ, we validated the code by ensuring we could reproduce the evaluation metrics in Table 4 of the paper. We achieve the following wikitext PPL scores:

Llama-2 7B Group 128:

  1. Paper: 5.60
  2. AutoAWQ: 5.615
  3. This implementation: 5.612
  4. we match what the paper reports for just RTN -- 5.73
  5. We get reasonable results for channel-wise -- 6.788. AutoAWQ errors out for this (setting "q_group_size": -1 in the quant_config), and results not reported in paper.

Llama-2 13B Group 128:

  1. We match the results of AutoAWQ and the results shown in the paper: 4.97
  2. We match what the paper reports for just RTN -- 4.984

NOTE: We are excluding the clipping logic in this implementation, if we want to add it we should add it as another modifier, they are mutually exclusive and the data model for AWQ doesn't align well with clipping. That might be the reason for the slight deviation of results reported in the paper and in our implementation

awq支持qwen2.5 7b模型的量化吗,我使用main分支的代码进行量化,报了这样的错
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 2048 but got size 1874 for tensor number 181 in the list.

@brian-dellabetta
Copy link
Collaborator Author

awq支持qwen2.5 7b模型的量化吗,我使用main分支的代码进行量化,报了这样的错 RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 2048 but got size 1874 for tensor number 181 in the list.

@Jim2016713 if you'd like to report a bug or issue please create a ticket with the corresponding details the form requests -- env versions, code snippet, full stack trace.

@ljwh
Copy link

ljwh commented May 21, 2025

awq支持qwen2.5 7b模型的量化吗,我使用main分支的代码进行量化,报了这样的错 RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 2048 but got size 1874 for tensor number 181 in the list.

@Jim2016713 the error msg means you should padding to the max length in tokenize function, i meet the same problem and solve like this:

```tokenizer(..., max_length=some_number, padding="max_length", truncation=True)

@brian-dellabetta
Copy link
Collaborator Author

Thanks @ljwh , this should be resolved on main as part of #1426 , though we haven't created a new version just yet.

@ljwh
Copy link

ljwh commented May 22, 2025

Thanks @ljwh , this should be resolved on main as part of #1426 , though we haven't created a new version just yet.

Is there a release plan? seems a lot of fix of awq memory use

@brian-dellabetta
Copy link
Collaborator Author

Thanks @ljwh , this should be resolved on main as part of #1426 , though we haven't created a new version just yet.

Is there a release plan? seems a lot of fix of awq memory use

@ljwh we will cut a release soon, a couple more fixes are in transit, #1435 & #1444

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants