Fix for Llama4 static quantization #430

vidyasiv · 2025-10-20T18:12:29Z

Command

PT_HPU_LAZY_MODE=0 ./calibrate_model.sh \
-m <>/llama4/Llama-4-Maverick-17B-128E-Instruct \
-d <>/mlperf_inference/llama2/processed-data.pkl \
-o /eager_output  -b 128 -t 8 -l 4096

Without this fix:

1/4 Preparing calibration dataset
Calling add_step_closure function does not have any effect. It's lazy mode only functionality. (warning logged once)
Calling mark_step function does not have any effect. It's lazy mode only functionality. (warning logged once)
Calling iter_mark_step function does not have any effect. It's lazy mode only functionality. (warning logged once)
Loading source dataset: /mnt/weka/data/mlperf_inference/llama2/processed-data.pkl
Creating calibration dataset...
Traceback (most recent call last):
  File "/root/work/vllm-gaudi/calibration/step-1-prepare-calibration-dataset.py", line 93, in <module>
    main(args)
  File "/root/work/vllm-gaudi/calibration/step-1-prepare-calibration-dataset.py", line 61, in main
    tmp_input = tokenizer.apply_chat_template(tmp_conversation,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'bool' object has no attribute 'apply_chat_template'

PR carried over from vllm-fork: HabanaAI/vllm-hpu-extension#341

Explanation:
Based on HF documentation: Llama4 AutoTokenizer should work for Llama4 Text only, in multimodel cases, we need to use Autoprocessor.

I noticed omitting use_fast=False or setting use_fast=True in AutoTokenizer.from_pretrained() helped get past the error.

github-actions · 2025-10-20T18:12:47Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2025-10-20T22:59:27Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
1c691f4a714981bd90ce536cbd00041d3e0aa7bb

github-actions · 2025-11-04T08:23:08Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0384aa7150c4c9778efca041ffd1beb3ad2bd694

Fix for Llama4 static quantization

44c1f76

vidyasiv marked this pull request as ready for review October 20, 2025 19:22

vidyasiv requested review from adobrzyn, afierka-intel, iboiko-habana, kzawora-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners October 20, 2025 19:22

michalkuligowski approved these changes Nov 4, 2025

View reviewed changes

Merge branch 'main' into vidyasiv/fix_llama4_static_quantization

1946867

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for Llama4 static quantization #430

Fix for Llama4 static quantization #430

vidyasiv commented Oct 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

github-actions bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix for Llama4 static quantization #430

Are you sure you want to change the base?

Fix for Llama4 static quantization #430

Conversation

vidyasiv commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 20, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Oct 20, 2025

✅ CI Passed

Uh oh!

github-actions bot commented Nov 4, 2025

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vidyasiv commented Oct 20, 2025 •

edited

Loading