Skip to content

AWQ QuantizationMixin + SequentialPipeline #1426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 15, 2025

Conversation

brian-dellabetta
Copy link
Collaborator

@brian-dellabetta brian-dellabetta commented May 12, 2025

SUMMARY:

  • Add QuantizationMixin to AWQModifier so we don't have redundant inputs (num_bits, symmetric, group_size)
  • Move AWQModifier to sequential pipelines, to avoid huge memory requirements of caching all activations at once.

Regression test results are acceptable, results are all roughly the same, and within stderr, see test plan below.

Resolves #1409
Resolves #1369
Related to #1383
Related to #1406
Related to #1368
Related to #1410

More improvements split into #1435

TEST PLAN:

  • Rerun tests to validate
    No regression in tests, comparing against those reported in original AWQ PR. All gsm8k results are within stderr:
Type gsm8k wikitext
Old AWQ+QuantModifier Sym .1054, .1069 9.1931
New AWQ+QuantMixin Sym .1077, .1084 9.1841
Old AWQ+QuantModifier Asym .1274, .1281 9.0281
New AWQ+QuantMixin Asym .1312, .1350 9.0288

Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@kylesayrs kylesayrs force-pushed the bdellabe/awq-quantization-mixin branch from 7273b1c to c4cd97c Compare May 15, 2025 18:58
Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modifier still has extreme memory constraints due to how we're still megabatching everything for _set_module_kwargs and _apply_smoothing

This followup PR gets part way towards removing that, but still batches outputs in its current state (grep for torch.cat): #1435

Let's land this as is and add accumulation in a follow up to reduce memory requirements and weird kwarg assumptions

Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
brian-dellabetta and others added 17 commits May 15, 2025 15:45
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-quantization-mixin branch from c4cd97c to 2659c22 Compare May 15, 2025 20:45
@brian-dellabetta brian-dellabetta enabled auto-merge (squash) May 15, 2025 21:45
Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Good updates!

@brian-dellabetta brian-dellabetta merged commit 6fa33a7 into main May 15, 2025
11 checks passed
@brian-dellabetta brian-dellabetta deleted the bdellabe/awq-quantization-mixin branch May 15, 2025 21:45
@brian-dellabetta brian-dellabetta mentioned this pull request May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AWQ] Insane memory requirement: over 900GB for 32B model OOM (host) when running AWQ
3 participants