You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SUMMARY:
- Add QuantizationMixin to AWQModifier so we don't have redundant inputs
(num_bits, symmetric, group_size)
- Move AWQModifier to sequential pipelines, to avoid huge memory
requirements of caching all activations at once.
Regression test results are acceptable, results are all roughly the
same, and within stderr, see test plan below.
Resolves#1409Resolves#1369
Related to #1383
Related to #1406
Related to #1368
Related to #1410
More improvements split into #1435
TEST PLAN:
- [x] Rerun tests to validate
No regression in tests, comparing against those reported in [original
AWQ
PR](#1177 (comment)).
All gsm8k results are within stderr:
| Type | gsm8k | wikitext
| ------ | ------ | -----
| Old AWQ+QuantModifier Sym | .1054, .1069 | 9.1931
| New AWQ+QuantMixin Sym | .1077, .1084 | 9.1841
| Old AWQ+QuantModifier Asym | .1274, .1281 | 9.0281
| New AWQ+QuantMixin Asym | .1312, .1350 | 9.0288
---------
Signed-off-by: Brian Dellabetta <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
0 commit comments