Skip to content

AWQ QuantizationMixin + SequentialPipeline #1426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
7c7d216
AWQ w/ QuantizationMixin p1
brian-dellabetta May 8, 2025
759414c
more updates
brian-dellabetta May 12, 2025
1f50823
with sequential epoch events
brian-dellabetta May 12, 2025
87c2d3f
running but model outputs gibberish
brian-dellabetta May 12, 2025
477ddcf
style fixes
brian-dellabetta May 12, 2025
92fa6c9
remove TODOs
brian-dellabetta May 12, 2025
e5cb407
reorder private methods
brian-dellabetta May 12, 2025
bb58557
codereview updates
brian-dellabetta May 13, 2025
8f8f950
touchups
brian-dellabetta May 13, 2025
6bed6c4
update lifecycle docstring
brian-dellabetta May 13, 2025
d6f7a04
extra safeguard
brian-dellabetta May 13, 2025
05d5f10
working! with tqdm log cleanup
brian-dellabetta May 13, 2025
f01f01f
stylefixes
brian-dellabetta May 13, 2025
711ff8e
Cleanup for ready PR
brian-dellabetta May 14, 2025
3731b5b
more cleanup
brian-dellabetta May 14, 2025
9d11b8e
fix failing tests
brian-dellabetta May 14, 2025
ebbe2ee
stylefixes
brian-dellabetta May 14, 2025
9d31328
context wrap fixes
brian-dellabetta May 14, 2025
7580449
drop lm_eval from example script
brian-dellabetta May 14, 2025
6b3d39e
prune unused sequential epoch start event
brian-dellabetta May 14, 2025
2c95e25
drop redundant docstrings
brian-dellabetta May 14, 2025
be4abf6
Update src/llmcompressor/modifiers/awq/base.py
brian-dellabetta May 15, 2025
47ccb1d
updates from live codereview
brian-dellabetta May 15, 2025
7c21691
validate activations with unit tests
brian-dellabetta May 15, 2025
d17fce8
style fixes
brian-dellabetta May 15, 2025
2659c22
stylefixes
brian-dellabetta May 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 1 addition & 44 deletions examples/awq/llama_example.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,8 @@
import lm_eval
from compressed_tensors.quantization import (
QuantizationArgs,
QuantizationScheme,
QuantizationStrategy,
QuantizationType,
)
from datasets import load_dataset
from lm_eval.utils import make_table
from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor import oneshot
from llmcompressor.modifiers.awq import AWQModifier
from llmcompressor.modifiers.quantization import QuantizationModifier

# Select model and load it.
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
Expand Down Expand Up @@ -61,23 +52,7 @@ def tokenize(sample):

# Configure the quantization algorithm to run.
recipe = [
AWQModifier(bits=4, symmetric=False),
QuantizationModifier(
ignore=["lm_head"],
config_groups={
"group_0": QuantizationScheme(
targets=["Linear"],
weights=QuantizationArgs(
num_bits=4,
type=QuantizationType.INT,
dynamic=False,
symmetric=False,
strategy=QuantizationStrategy.GROUP,
group_size=128,
),
)
},
),
AWQModifier(ignore=["lm_head"], scheme="W4A16_ASYM", targets=["Linear"]),
]

# Apply algorithms.
Expand All @@ -101,21 +76,3 @@ def tokenize(sample):
SAVE_DIR = MODEL_ID.split("/")[-1] + "-awq-asym"
model.save_pretrained(SAVE_DIR, save_compressed=True)
tokenizer.save_pretrained(SAVE_DIR)

#
# 2) Evaluate model on wikitext perplexity
#

results = lm_eval.simple_evaluate(
model="hf",
model_args={
"pretrained": SAVE_DIR,
"add_bos_token": True,
"dtype": "bfloat16",
"gpu_memory_utilization": 0.5,
},
tasks=["wikitext"],
num_fewshot=5,
batch_size="auto",
)
print(make_table(results))
Loading