[WIP][AWQ] Support accumulation for reduced memory usage #1435

kylesayrs · 2025-05-15T19:02:32Z

No description provided.

Signed-off-by: Brian Dellabetta <[email protected]>

Co-authored-by: Kyle Sayers <[email protected]>

Signed-off-by: Brian Dellabetta <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta · 2025-05-15T19:14:07Z

src/llmcompressor/modifiers/awq/base.py

-            int_w_output = self._forward_input_with_kwargs(
-                module=module2inspect, inputs=x, input_kwargs=self._module_kwargs
-            )
+            int_w_output = self._run_samples(parent_layer)


inputs here should be x, not the dataset

x is the dataset, right?

my bad, misread what self._samples was. can we keep that as a different name to distinguish it from actual data?

Yeah I'm always happy to take suggestions on names for any of the code I push 🙂

brian-dellabetta · 2025-05-15T19:18:57Z

src/llmcompressor/modifiers/awq/base.py

+        default_factory=IntermediatesCache
+    )
+    _sample_means: Dict[Module, float] = PrivateAttr(default_factory=dict)
+    _num_samples: Dict[Module, int] = PrivateAttr(default_factory=dict)


I know this is in GPTQ, but a field named num_samples indicates an int in virtually any context i've seen it. what about _sample_counts so it's more similar to sample_means?

That's good too!

src/llmcompressor/modifiers/awq/base.py

Signed-off-by: Kyle Sayers <[email protected]>

rahul-tuli

These deletions make me happy! LGTM!

rahul-tuli · 2025-05-15T20:33:17Z

src/llmcompressor/modifiers/awq/helpers.py

+AWQ_PRECISION = torch.float32
+
+
+def accumulate_mean(


rahul-tuli · 2025-05-15T20:34:33Z

src/llmcompressor/pipelines/cache.py

+    def iter(
+        self, input_names: Optional[List[str]] = None
+    ) -> Generator[Any, None, None]:
+        for batch_index in self.batch_intermediates:
+            yield self.fetch(batch_index, input_names)
+
+    def __iter__(self) -> Generator[Any, None, None]:
+        yield from self.iter()


Why not just have iter?

Saves an extra function call

for batch in cache: ...

for batch in cache.iter(): ...

SUMMARY: - Add QuantizationMixin to AWQModifier so we don't have redundant inputs (num_bits, symmetric, group_size) - Move AWQModifier to sequential pipelines, to avoid huge memory requirements of caching all activations at once. Regression test results are acceptable, results are all roughly the same, and within stderr, see test plan below. Resolves #1409 Resolves #1369 Related to #1383 Related to #1406 Related to #1368 Related to #1410 More improvements split into #1435 TEST PLAN: - [x] Rerun tests to validate No regression in tests, comparing against those reported in [original AWQ PR](#1177 (comment)). All gsm8k results are within stderr: | Type | gsm8k | wikitext | ------ | ------ | ----- | Old AWQ+QuantModifier Sym | .1054, .1069 | 9.1931 | New AWQ+QuantMixin Sym | .1077, .1084 | 9.1841 | Old AWQ+QuantModifier Asym | .1274, .1281 | 9.0281 | New AWQ+QuantMixin Asym | .1312, .1350 | 9.0288 --------- Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Kyle Sayers <[email protected]>

brian-dellabetta and others added 28 commits May 14, 2025 16:39

AWQ w/ QuantizationMixin p1

ec8471a

Signed-off-by: Brian Dellabetta <[email protected]>

more updates

db404b9

Signed-off-by: Brian Dellabetta <[email protected]>

with sequential epoch events

a797d23

Signed-off-by: Brian Dellabetta <[email protected]>

running but model outputs gibberish

b9bdfb6

Signed-off-by: Brian Dellabetta <[email protected]>

style fixes

fbcc6c4

Signed-off-by: Brian Dellabetta <[email protected]>

remove TODOs

ca15604

Signed-off-by: Brian Dellabetta <[email protected]>

reorder private methods

f805083

Signed-off-by: Brian Dellabetta <[email protected]>

codereview updates

8049365

Signed-off-by: Brian Dellabetta <[email protected]>

touchups

d86a738

Signed-off-by: Brian Dellabetta <[email protected]>

update lifecycle docstring

cfa8625

Signed-off-by: Brian Dellabetta <[email protected]>

extra safeguard

5138e36

Signed-off-by: Brian Dellabetta <[email protected]>

working! with tqdm log cleanup

308f02c

Signed-off-by: Brian Dellabetta <[email protected]>

stylefixes

7d4dec9

Signed-off-by: Brian Dellabetta <[email protected]>

Cleanup for ready PR

a1fe54d

Signed-off-by: Brian Dellabetta <[email protected]>

more cleanup

db4cbad

Signed-off-by: Brian Dellabetta <[email protected]>

fix failing tests

abc976f

Signed-off-by: Brian Dellabetta <[email protected]>

stylefixes

078dcdb

Signed-off-by: Brian Dellabetta <[email protected]>

context wrap fixes

ae65afd

Signed-off-by: Brian Dellabetta <[email protected]>

drop lm_eval from example script

edba8f7

Signed-off-by: Brian Dellabetta <[email protected]>

prune unused sequential epoch start event

bdf3793

Signed-off-by: Brian Dellabetta <[email protected]>

drop redundant docstrings

64efc63

Signed-off-by: Brian Dellabetta <[email protected]>

Update src/llmcompressor/modifiers/awq/base.py

534a9a5

Co-authored-by: Kyle Sayers <[email protected]>

updates from live codereview

13f6bcb

Signed-off-by: Brian Dellabetta <[email protected]>

validate activations with unit tests

c8eeabc

Signed-off-by: Brian Dellabetta <[email protected]>

style fixes

77b2614

Signed-off-by: Brian Dellabetta <[email protected]>

stylefixes

c4cd97c

Signed-off-by: Brian Dellabetta <[email protected]>

accumulation

b65579a

Signed-off-by: Kyle Sayers <[email protected]>

fix args

25ff346

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta reviewed May 15, 2025

View reviewed changes

fix helper

abfd68b

Signed-off-by: Kyle Sayers <[email protected]>

rahul-tuli reviewed May 15, 2025

View reviewed changes

kylesayrs mentioned this pull request May 15, 2025

AWQ QuantizationMixin + SequentialPipeline #1426

Merged

1 task

brian-dellabetta force-pushed the bdellabe/awq-quantization-mixin branch from c4cd97c to 2659c22 Compare May 15, 2025 20:45

Base automatically changed from bdellabe/awq-quantization-mixin to main May 15, 2025 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][AWQ] Support accumulation for reduced memory usage #1435

[WIP][AWQ] Support accumulation for reduced memory usage #1435

kylesayrs commented May 15, 2025

brian-dellabetta May 15, 2025

kylesayrs May 15, 2025

brian-dellabetta May 15, 2025

kylesayrs May 15, 2025

brian-dellabetta May 15, 2025 •

edited

Loading

kylesayrs May 15, 2025

rahul-tuli left a comment

rahul-tuli May 15, 2025

rahul-tuli May 15, 2025

kylesayrs May 16, 2025

[WIP][AWQ] Support accumulation for reduced memory usage #1435

Are you sure you want to change the base?

[WIP][AWQ] Support accumulation for reduced memory usage #1435

Conversation

kylesayrs commented May 15, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brian-dellabetta May 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul-tuli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brian-dellabetta May 15, 2025 •

edited

Loading