AWQ Modifier #1177

brian-dellabetta · 2025-02-19T18:04:57Z

SUMMARY:
Addition of AWQModifier, based on AutoAWQ implementation.

Should be reviewed/merged in conjunction with neuralmagic/compressed-tensors#269

Replaces #181 and #824

TEST PLAN:
Some unit tests included, but as this was mostly a port from AutoAWQ, we validated the code by ensuring we could reproduce the evaluation metrics in Table 4 of the paper. We achieve the following wikitext PPL scores:

Llama-2 7B Group 128:

Paper: 5.60
AutoAWQ: 5.615
This implementation: 5.612
we match what the paper reports for just RTN -- 5.73
We get reasonable results for channel-wise -- 6.788. AutoAWQ errors out for this (setting "q_group_size": -1 in the quant_config), and results not reported in paper.

Llama-2 13B Group 128:

We match the results of AutoAWQ and the results shown in the paper: 4.97
We match what the paper reports for just RTN -- 4.984

NOTE: We are excluding the clipping logic in this implementation, if we want to add it we should add it as another modifier, they are mutually exclusive and the data model for AWQ doesn't align well with clipping. That might be the reason for the slight deviation of results reported in the paper and in our implementation

github-actions · 2025-02-19T18:05:11Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

src/llmcompressor/modifiers/awq/base.py

dsikka

Should we add evals comparing to GPTQ?

brian-dellabetta · 2025-03-21T20:37:06Z

Using the latest commit at this time, I am getting the following results via lm-eval.

deepseek-ai/DeepSeek-R1-Distill-Llama-8B:
 dense:
   #gsm flexible-extract, strict-match
   gsm8k: .6619, .6490
   wititext ppl: 15.4498
 awq+quant sym:
   gsm8k: .6376, .6217
   wititext ppl: 18.8623
 quant sym:
   gsm8k: .6732, .6543
   wititext ppl: 16.7398
meta-llama/Llama-2-7b-hf:
 dense:
   gsm8k: .1342, .1342
   wititext ppl: 8.7587
 awq+quant sym:
   gsm8k: .1024, .1001
   wititext ppl: 9.194
 quant sym:
   gsm8k: .1183, .1152
   wititext ppl: 9.311

brian-dellabetta · 2025-04-02T16:24:22Z

Comparing AWQ vs. GPTQ vs. RTN for meta-llama/Llama-2-7b-hf, using example script:

Type	gsm8k	wikitext
FP16	.1395, .1387	8.7521
AWQ ASYM	.1281, .1274	9.0281
GPTQ ASYM	.1312, .1296	9.1954
AWQ+GPTQ ASYM	.1251, .1221	9.1449
RTN ASYM	.1198, .1190	9.2098
AWQ SYM	.1069, .1054	9.1931
GPTQ SYM	.1046, .1039	9.3525
AWQ+GPTQ SYM	.0955, .0925	9.4326
RTN SYM	.1183, .1152	9.3114

dsikka

Can we fix quality

Signed-off-by: Brian Dellabetta <[email protected]>

This PR updates the main README.md to introduce a "New Features" section, improving visibility for recent major additions to LLM Compressor. This section highlights: - Axolotl Sparse Finetuning Integration (https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor) - AutoAWQ Integration for low-bit weight quantization (#1177) - Day 0 Llama 4 support and its use by Meta This helps users quickly understand the latest capabilities of the library. --------- Signed-off-by: Rahul Tuli <[email protected]>

SUMMARY: - Add QuantizationMixin to AWQModifier so we don't have redundant inputs (num_bits, symmetric, group_size) - Move AWQModifier to sequential pipelines, to avoid huge memory requirements of caching all activations at once. Regression test results are acceptable, results are all roughly the same, and within stderr, see test plan below. Resolves #1409 Resolves #1369 Related to #1383 Related to #1406 Related to #1368 Related to #1410 More improvements split into #1435 TEST PLAN: - [x] Rerun tests to validate No regression in tests, comparing against those reported in [original AWQ PR](#1177 (comment)). All gsm8k results are within stderr: | Type | gsm8k | wikitext | ------ | ------ | ----- | Old AWQ+QuantModifier Sym | .1054, .1069 | 9.1931 | New AWQ+QuantMixin Sym | .1077, .1084 | 9.1841 | Old AWQ+QuantModifier Asym | .1274, .1281 | 9.0281 | New AWQ+QuantMixin Asym | .1312, .1350 | 9.0288 --------- Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Kyle Sayers <[email protected]>

Jim2016713 · 2025-05-16T10:59:08Z

SUMMARY: Addition of AWQModifier, based on AutoAWQ implementation.

Should be reviewed/merged in conjunction with neuralmagic/compressed-tensors#269

Replaces #181 and #824

TEST PLAN: Some unit tests included, but as this was mostly a port from AutoAWQ, we validated the code by ensuring we could reproduce the evaluation metrics in Table 4 of the paper. We achieve the following wikitext PPL scores:

Llama-2 7B Group 128:

Paper: 5.60

AutoAWQ: 5.615

This implementation: 5.612

we match what the paper reports for just RTN -- 5.73

We get reasonable results for channel-wise -- 6.788. AutoAWQ errors out for this (setting "q_group_size": -1 in the quant_config), and results not reported in paper.

Llama-2 13B Group 128:

We match the results of AutoAWQ and the results shown in the paper: 4.97

We match what the paper reports for just RTN -- 4.984

NOTE: We are excluding the clipping logic in this implementation, if we want to add it we should add it as another modifier, they are mutually exclusive and the data model for AWQ doesn't align well with clipping. That might be the reason for the slight deviation of results reported in the paper and in our implementation

awq支持qwen2.5 7b模型的量化吗，我使用main分支的代码进行量化，报了这样的错
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 2048 but got size 1874 for tensor number 181 in the list.

brian-dellabetta · 2025-05-16T15:47:46Z

awq支持qwen2.5 7b模型的量化吗，我使用main分支的代码进行量化，报了这样的错 RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 2048 but got size 1874 for tensor number 181 in the list.

@Jim2016713 if you'd like to report a bug or issue please create a ticket with the corresponding details the form requests -- env versions, code snippet, full stack trace.

ljwh · 2025-05-21T10:06:30Z

awq支持qwen2.5 7b模型的量化吗，我使用main分支的代码进行量化，报了这样的错 RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 2048 but got size 1874 for tensor number 181 in the list.

@Jim2016713 the error msg means you should padding to the max length in tokenize function, i meet the same problem and solve like this:

```tokenizer(..., max_length=some_number, padding="max_length", truncation=True)

brian-dellabetta · 2025-05-21T14:11:32Z

Thanks @ljwh , this should be resolved on main as part of #1426 , though we haven't created a new version just yet.

ljwh · 2025-05-22T03:41:57Z

Thanks @ljwh , this should be resolved on main as part of #1426 , though we haven't created a new version just yet.

Is there a release plan? seems a lot of fix of awq memory use

brian-dellabetta · 2025-05-22T14:06:06Z

Thanks @ljwh , this should be resolved on main as part of #1426 , though we haven't created a new version just yet.

Is there a release plan? seems a lot of fix of awq memory use

@ljwh we will cut a release soon, a couple more fixes are in transit, #1435 & #1444

kylesayrs · 2025-06-20T17:22:50Z

@felmoreno1726 You'll have to explain a little more what you mean. If this is in reference to implementation of layer-wise calibration, you can see the implementation here.

felmoreno1726 · 2025-06-20T17:49:47Z

I'm sorry. I commented on the wrong issue! Disregard my message

SUMMARY: - Add QuantizationMixin to AWQModifier so we don't have redundant inputs (num_bits, symmetric, group_size) - Move AWQModifier to sequential pipelines, to avoid huge memory requirements of caching all activations at once. Regression test results are acceptable, results are all roughly the same, and within stderr, see test plan below. Resolves vllm-project#1409 Resolves vllm-project#1369 Related to vllm-project#1383 Related to vllm-project#1406 Related to vllm-project#1368 Related to vllm-project#1410 More improvements split into vllm-project#1435 TEST PLAN: - [x] Rerun tests to validate No regression in tests, comparing against those reported in [original AWQ PR](vllm-project#1177 (comment)). All gsm8k results are within stderr: | Type | gsm8k | wikitext | ------ | ------ | ----- | Old AWQ+QuantModifier Sym | .1054, .1069 | 9.1931 | New AWQ+QuantMixin Sym | .1077, .1084 | 9.1841 | Old AWQ+QuantModifier Asym | .1274, .1281 | 9.0281 | New AWQ+QuantMixin Asym | .1312, .1350 | 9.0288 --------- Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Kyle Sayers <[email protected]>

kylesayrs reviewed Feb 19, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

kylesayrs reviewed Feb 20, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Show resolved Hide resolved

brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch 2 times, most recently from 9273ef3 to 28f8bca Compare February 20, 2025 17:27

kylesayrs reviewed Feb 20, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Show resolved Hide resolved

kylesayrs reviewed Feb 25, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

brian-dellabetta changed the title ~~Bdellabe/awq modifier v3~~ Bdellabe/Rtuli awq modifier v3 Mar 10, 2025

brian-dellabetta marked this pull request as ready for review March 10, 2025 21:45

brian-dellabetta requested review from markurtz, kylesayrs, dsikka, rahul-tuli and horheynm March 10, 2025 21:45

brian-dellabetta mentioned this pull request Mar 10, 2025

Some fixes for AWQ neuralmagic/compressed-tensors#269

Merged

brian-dellabetta commented Mar 10, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

dsikka requested changes Mar 11, 2025

View reviewed changes

brian-dellabetta mentioned this pull request Mar 17, 2025

Does it support importing the AWQ model and then exporting it in the compressed-tensor format? neuralmagic/compressed-tensors#153

Closed

dsikka mentioned this pull request Mar 19, 2025

Does it support AWQ in the release or main branch? #1268

Closed

brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch 2 times, most recently from 8120fe5 to d76ba6d Compare March 19, 2025 14:59

dsikka mentioned this pull request Mar 24, 2025

Any plan to support AWQ ? #1277

Closed

dsikka changed the title ~~Bdellabe/Rtuli awq modifier v3~~ AWQ Modifier Support Mar 25, 2025

dsikka mentioned this pull request Mar 25, 2025

Can i use qqq in llm-comprefessor? #1281

Closed

brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch 2 times, most recently from 9168743 to 21fc931 Compare April 1, 2025 20:23

brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from 21fc931 to 03f7546 Compare April 2, 2025 20:20

brian-dellabetta added the ready When a PR is ready for review label Apr 3, 2025

dsikka previously approved these changes Apr 18, 2025

View reviewed changes

dsikka enabled auto-merge (squash) April 18, 2025 15:07

brian-dellabetta dismissed dsikka’s stale review via 4599d39 April 18, 2025 19:02

brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from 4599d39 to 4b3325c Compare April 18, 2025 19:06

dsikka requested review from kylesayrs and dsikka April 18, 2025 19:08

dsikka reviewed Apr 18, 2025

View reviewed changes

brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from 4b3325c to dd163b0 Compare April 18, 2025 19:31

revisions from codereview

d1d3766

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from dd163b0 to d1d3766 Compare April 18, 2025 19:40

dsikka approved these changes Apr 18, 2025

View reviewed changes

kylesayrs approved these changes Apr 21, 2025

View reviewed changes

dsikka merged commit 549b42a into main Apr 21, 2025
8 checks passed

dsikka deleted the bdellabe/awq-modifier-v3 branch April 21, 2025 14:50

brian-dellabetta mentioned this pull request Apr 23, 2025

Feature Branch for AWQ Modifier #181

Closed

4 tasks

rahul-tuli mentioned this pull request May 2, 2025

Add new-features section #1408

Merged

brian-dellabetta mentioned this pull request May 14, 2025

AWQ QuantizationMixin + SequentialPipeline #1426

Merged

1 task

This comment was marked as off-topic.

Sign in to view

AWQ Modifier #1177

AWQ Modifier #1177

Uh oh!

Conversation

brian-dellabetta commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta commented Mar 21, 2025

Uh oh!

brian-dellabetta commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jim2016713 commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brian-dellabetta commented May 16, 2025

Uh oh!

ljwh commented May 21, 2025

Uh oh!

brian-dellabetta commented May 21, 2025

Uh oh!

ljwh commented May 22, 2025

Uh oh!

brian-dellabetta commented May 22, 2025

Uh oh!

This comment was marked as off-topic.

kylesayrs commented Jun 20, 2025

Uh oh!

felmoreno1726 commented Jun 20, 2025

Uh oh!

Uh oh!

brian-dellabetta commented Feb 19, 2025 •

edited

Loading

brian-dellabetta commented Apr 2, 2025 •

edited

Loading

Jim2016713 commented May 16, 2025 •

edited

Loading