AWQ QuantizationMixin + SequentialPipeline #1426

brian-dellabetta · 2025-05-12T22:29:22Z

SUMMARY:

Add QuantizationMixin to AWQModifier so we don't have redundant inputs (num_bits, symmetric, group_size)
Move AWQModifier to sequential pipelines, to avoid huge memory requirements of caching all activations at once.

Regression test results are acceptable, results are all roughly the same, and within stderr, see test plan below.

Resolves #1409
Resolves #1369
Related to #1383
Related to #1406
Related to #1368
Related to #1410

More improvements split into #1435

TEST PLAN:

Rerun tests to validate
No regression in tests, comparing against those reported in original AWQ PR. All gsm8k results are within stderr:

Type	gsm8k	wikitext
Old AWQ+QuantModifier Sym	.1054, .1069	9.1931
New AWQ+QuantMixin Sym	.1077, .1084	9.1841
Old AWQ+QuantModifier Asym	.1274, .1281	9.0281
New AWQ+QuantMixin Asym	.1312, .1350	9.0288

github-actions · 2025-05-12T22:29:31Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

src/llmcompressor/modifiers/awq/base.py

src/llmcompressor/modifiers/utils/pytorch_helpers.py

src/llmcompressor/modifiers/awq/base.py

kylesayrs

This modifier still has extreme memory constraints due to how we're still megabatching everything for _set_module_kwargs and _apply_smoothing

This followup PR gets part way towards removing that, but still batches outputs in its current state (grep for torch.cat): #1435

Let's land this as is and add accumulation in a follow up to reduce memory requirements and weird kwarg assumptions

Signed-off-by: Brian Dellabetta <[email protected]>

Co-authored-by: Kyle Sayers <[email protected]>

Signed-off-by: Brian Dellabetta <[email protected]>

rahul-tuli

Lgtm! Good updates!

SUMMARY: - Add QuantizationMixin to AWQModifier so we don't have redundant inputs (num_bits, symmetric, group_size) - Move AWQModifier to sequential pipelines, to avoid huge memory requirements of caching all activations at once. Regression test results are acceptable, results are all roughly the same, and within stderr, see test plan below. Resolves vllm-project#1409 Resolves vllm-project#1369 Related to vllm-project#1383 Related to vllm-project#1406 Related to vllm-project#1368 Related to vllm-project#1410 More improvements split into vllm-project#1435 TEST PLAN: - [x] Rerun tests to validate No regression in tests, comparing against those reported in [original AWQ PR](vllm-project#1177 (comment)). All gsm8k results are within stderr: | Type | gsm8k | wikitext | ------ | ------ | ----- | Old AWQ+QuantModifier Sym | .1054, .1069 | 9.1931 | New AWQ+QuantMixin Sym | .1077, .1084 | 9.1841 | Old AWQ+QuantModifier Asym | .1274, .1281 | 9.0281 | New AWQ+QuantMixin Asym | .1312, .1350 | 9.0288 --------- Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Kyle Sayers <[email protected]>

brian-dellabetta requested review from kylesayrs and rahul-tuli May 12, 2025 22:29

kylesayrs reviewed May 13, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

brian-dellabetta marked this pull request as ready for review May 14, 2025 14:43

brian-dellabetta force-pushed the bdellabe/awq-quantization-mixin branch from d1d9171 to d0afebb Compare May 14, 2025 14:44

This was referenced May 14, 2025

[Examples] Standardize AWQ example #1412

Merged

[Feature] Log/info/Save/Restore quantization steps #1410

Closed

brian-dellabetta force-pushed the bdellabe/awq-quantization-mixin branch from f871638 to 645f664 Compare May 14, 2025 19:59

brian-dellabetta added the ready When a PR is ready for review label May 14, 2025

brian-dellabetta requested review from dsikka, kylesayrs and shanjiaz May 14, 2025 20:25

brian-dellabetta mentioned this pull request May 14, 2025

NotImplementedError: Cannot copy out of meta tensor; no data! when trying to run AWQ #1432

Closed

brian-dellabetta force-pushed the bdellabe/awq-quantization-mixin branch from 9091209 to bdf3793 Compare May 14, 2025 21:39

brian-dellabetta mentioned this pull request May 14, 2025

NotImplementedError: No compressed-tensors compatible scheme was found #1427

Closed

kylesayrs requested changes May 15, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/awq/base.py Show resolved Hide resolved

brian-dellabetta requested a review from kylesayrs May 15, 2025 17:10

kylesayrs force-pushed the bdellabe/awq-quantization-mixin branch from 7273b1c to c4cd97c Compare May 15, 2025 18:58

kylesayrs approved these changes May 15, 2025

View reviewed changes

brian-dellabetta added 9 commits May 15, 2025 15:45

AWQ w/ QuantizationMixin p1

7c7d216

Signed-off-by: Brian Dellabetta <[email protected]>

more updates

759414c

Signed-off-by: Brian Dellabetta <[email protected]>

with sequential epoch events

1f50823

Signed-off-by: Brian Dellabetta <[email protected]>

running but model outputs gibberish

87c2d3f

Signed-off-by: Brian Dellabetta <[email protected]>

style fixes

477ddcf

Signed-off-by: Brian Dellabetta <[email protected]>

remove TODOs

92fa6c9

Signed-off-by: Brian Dellabetta <[email protected]>

reorder private methods

e5cb407

Signed-off-by: Brian Dellabetta <[email protected]>

codereview updates

bb58557

Signed-off-by: Brian Dellabetta <[email protected]>

touchups

8f8f950

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta and others added 17 commits May 15, 2025 15:45

update lifecycle docstring

6bed6c4

Signed-off-by: Brian Dellabetta <[email protected]>

extra safeguard

d6f7a04

Signed-off-by: Brian Dellabetta <[email protected]>

working! with tqdm log cleanup

05d5f10

Signed-off-by: Brian Dellabetta <[email protected]>

stylefixes

f01f01f

Signed-off-by: Brian Dellabetta <[email protected]>

Cleanup for ready PR

711ff8e

Signed-off-by: Brian Dellabetta <[email protected]>

more cleanup

3731b5b

Signed-off-by: Brian Dellabetta <[email protected]>

fix failing tests

9d11b8e

Signed-off-by: Brian Dellabetta <[email protected]>

stylefixes

ebbe2ee

Signed-off-by: Brian Dellabetta <[email protected]>

context wrap fixes

9d31328

Signed-off-by: Brian Dellabetta <[email protected]>

drop lm_eval from example script

7580449

Signed-off-by: Brian Dellabetta <[email protected]>

prune unused sequential epoch start event

6b3d39e

Signed-off-by: Brian Dellabetta <[email protected]>

drop redundant docstrings

2c95e25

Signed-off-by: Brian Dellabetta <[email protected]>

Update src/llmcompressor/modifiers/awq/base.py

be4abf6

Co-authored-by: Kyle Sayers <[email protected]>

updates from live codereview

47ccb1d

Signed-off-by: Brian Dellabetta <[email protected]>

validate activations with unit tests

7c21691

Signed-off-by: Brian Dellabetta <[email protected]>

style fixes

d17fce8

Signed-off-by: Brian Dellabetta <[email protected]>

stylefixes

2659c22

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta force-pushed the bdellabe/awq-quantization-mixin branch from c4cd97c to 2659c22 Compare May 15, 2025 20:45

brian-dellabetta enabled auto-merge (squash) May 15, 2025 21:45

rahul-tuli approved these changes May 15, 2025

View reviewed changes

brian-dellabetta merged commit 6fa33a7 into main May 15, 2025
11 checks passed

brian-dellabetta deleted the bdellabe/awq-quantization-mixin branch May 15, 2025 21:45

brian-dellabetta mentioned this pull request May 21, 2025

AWQ Modifier #1177

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AWQ QuantizationMixin + SequentialPipeline #1426

AWQ QuantizationMixin + SequentialPipeline #1426

Uh oh!

brian-dellabetta commented May 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs left a comment •

edited

Loading

Uh oh!

rahul-tuli left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AWQ QuantizationMixin + SequentialPipeline #1426

AWQ QuantizationMixin + SequentialPipeline #1426

Uh oh!

Conversation

brian-dellabetta commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rahul-tuli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brian-dellabetta commented May 12, 2025 •

edited

Loading

kylesayrs left a comment •

edited

Loading