AWQ Qwen and Phi mappings #1440

brian-dellabetta · 2025-05-16T18:19:03Z

SUMMARY:
I wanted to create a PR showing users how they can add more mappings to AWQ to account for more models. Turns out qwen has the exact same as Llama, so I added one for Phi as well. I also updated the naming and used the infer pattern employed in SmoothQuant, rather than requiring user to set it

TEST PLAN:
examples/awq/llama_example.py works on this branch for

MODEL_ID = "microsoft/Phi-4-mini-reasoning"

TODOs:

Merge in after AWQ Apply Scales Bugfix when smooth layer output length doesn't match balance layer input length #1451 lands

github-actions · 2025-05-16T18:19:13Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

kylesayrs

neat

@anmarques

… balance layer input length (#1451) ### Summary We are hitting an edge case in AWQ we had not previously hit with the initial Llama/Qwen testing models. When a smooth layer's # of output_features does not match a balance layer's # of input_features, the code as it is currently will error out when trying to update the smooth layer's weights with `weights.div(scales)`, due to a shape mismatch error. We are hitting this in #1440 for Phi3 models, which include a mapping between the fused `qkv_proj` smooth layer and `o_proj` balance layer in AutoAWQ (see [here](https://github.com/casper-hansen/AutoAWQ/blob/main/awq/models/phi3.py#L51-L57)). The resolution in AutoAWQ is to only use the last rows of the smooth layer so that the shapes line up, as shown [here](https://github.com/casper-hansen/AutoAWQ/blob/main/awq/quantize/scale.py#L123). This PR includes that update, and with #1440 will allow Phi3 models to be quantizable with AWQModifier. Like with v_proj -> o_proj, if shapes don't match up, they will be excluded from resolved mappings. This allows [phi-3-mini](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/tree/main?show_file_info=model-00001-of-00002.safetensors) to include the mapping because `qkv_proj out_features == 3*o_proj in_features == 9216`, but excludes it from [phi-3-medium](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct/tree/main?show_file_info=model-00001-of-00006.safetensors) which has `qkv_proj out_features == 7680` and `o_proj in_features==5120`. If the mapping is included for phi-3-medium, the model blows up with wikitext eval perplexities >2000. This implementation was agreed upon with @anmarques . PS: I also moved `mul` & `div` to `mul_` & `div_`, to avoid unnecessary memory allocation. ------------- ### Test Plan With these changes and with #1440 , `examples/awq/llama_example.py` works with `"microsoft/Phi-3-mini-128k-instruct"` and produces similar results as when qkv_proj to o_proj mapping is included Without mapping: | Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr| |--------|------:|------|-----:|---------------|---|------:|---|------| |wikitext| 2|none | 5|bits_per_byte |↓ | 0.6474|± | N/A| | | |none | 5|byte_perplexity|↓ | 1.5664|± | N/A| | | |none | 5|word_perplexity|↓ |11.0201|± | N/A| With mapping: | Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr| |--------|------:|------|-----:|---------------|---|------:|---|------| |wikitext| 2|none | 5|bits_per_byte |↓ | 0.6482|± | N/A| | | |none | 5|byte_perplexity|↓ | 1.5672|± | N/A| | | |none | 5|word_perplexity|↓ |11.0527|± | N/A| I also confirmed re-running with `meta-llama/Llama-3.2-3B-Instruct` and `meta-llama/Llama-2-7b-hf` does not deviate in PPL scores from what is currently on `main` --------- Signed-off-by: Brian Dellabetta <[email protected]>

src/llmcompressor/modifiers/awq/base.py

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta requested review from kylesayrs, dsikka, rahul-tuli and shanjiaz May 16, 2025 18:19

brian-dellabetta added the ready When a PR is ready for review label May 16, 2025

brian-dellabetta mentioned this pull request May 16, 2025

Add Additional Model Mappings for AWQ and SmoothQuant #1442

Open

kylesayrs previously approved these changes May 17, 2025

View reviewed changes

brian-dellabetta mentioned this pull request May 19, 2025

AWQ Apply Scales Bugfix when smooth layer output length doesn't match balance layer input length #1451

Merged

brian-dellabetta dismissed kylesayrs’s stale review via 54fa07c May 21, 2025 18:50

brian-dellabetta force-pushed the bdellabe/awq-qwen-mappings branch from 56f6069 to 54fa07c Compare May 21, 2025 18:50

brian-dellabetta requested a review from kylesayrs May 21, 2025 19:16

brian-dellabetta commented May 21, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Show resolved Hide resolved

rahul-tuli previously approved these changes May 21, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

brian-dellabetta enabled auto-merge (squash) May 21, 2025 20:02

brian-dellabetta dismissed rahul-tuli’s stale review via ad350cb May 21, 2025 20:04

rahul-tuli approved these changes May 21, 2025

View reviewed changes

kylesayrs approved these changes May 21, 2025

View reviewed changes

brian-dellabetta added 11 commits May 21, 2025 15:26

squashed/rebased

f94efb6

Signed-off-by: Brian Dellabetta <[email protected]>

add Phi Mappings to show different mappings

d235a26

Signed-off-by: Brian Dellabetta <[email protected]>

phi mappings

0a83328

Signed-off-by: Brian Dellabetta <[email protected]>

drop pbar for another PR

f8e752f

Signed-off-by: Brian Dellabetta <[email protected]>

revert v_proj / o_proj in skip conditional

d4eefab

Signed-off-by: Brian Dellabetta <[email protected]>

revert skipped name

dc32ade

Signed-off-by: Brian Dellabetta <[email protected]>

merge conflict fix

3d1c555

Signed-off-by: Brian Dellabetta <[email protected]>

move to SmoothQuant's mappings nomenclature and infer pattern

4f3aba0

Signed-off-by: Brian Dellabetta <[email protected]>

stylefixes

c18c66b

Signed-off-by: Brian Dellabetta <[email protected]>

bugfix

b70d054

Signed-off-by: Brian Dellabetta <[email protected]>

switch to absolute imports

32622b1

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta force-pushed the bdellabe/awq-qwen-mappings branch from ad350cb to 32622b1 Compare May 21, 2025 20:26

brian-dellabetta merged commit 1fb1377 into main May 21, 2025
11 checks passed

brian-dellabetta deleted the bdellabe/awq-qwen-mappings branch May 21, 2025 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWQ Qwen and Phi mappings #1440

AWQ Qwen and Phi mappings #1440

brian-dellabetta commented May 16, 2025 •

edited

Loading

github-actions bot commented May 16, 2025

kylesayrs left a comment

AWQ Qwen and Phi mappings #1440

AWQ Qwen and Phi mappings #1440

Conversation

brian-dellabetta commented May 16, 2025 • edited Loading

github-actions bot commented May 16, 2025

kylesayrs left a comment

Choose a reason for hiding this comment

brian-dellabetta commented May 16, 2025 •

edited

Loading