[Bug]: Missing keys when loading Qwen2.5-VL-FP8 weights in TensorRT-LLM

### System Info

When running the official example(https://huggingface.co/nvidia/Qwen2.5-VL-7B-Instruct-FP8) for qwen2.5-vl-fp8 with TensorRT-LLM, I encountered missing keys when loading model weights. The model failed to initialize correctly due to a mismatch between checkpoint and model definition.

Environment:
GPU: H20
Driver Version: 535.216.01
CUDA Version: 13.0
OS: Ubuntu 24.04

Python Environment:
Python 3.12.3
torch                             2.8.0a0+34c6371d24.nv25.8
tensorrt_llm                      1.2.0rc0
transformers                      4.56.0

Error logs: 
```
RuntimeError: Error(s) in loading state_dict for Qwen2VisionModelBase:
        Missing key(s) in state_dict: "visual.blocks.0.attn.qkv_proj.weight_scale", "visual.blocks.0.attn.qkv_proj.input_scale", "visual.blocks.0.attn.qkv_proj.inv_input_scale", "visual.blocks.0.attn.qkv_proj.kv_scales", "visual.blocks.0.attn.qkv_proj.inv_kv_scales", 
........
"visual.blocks.31.attn.o_proj.inv_input_scale", "visual.blocks.31.attn.o_proj.kv_scales", "visual.blocks.31.attn.o_proj.inv_kv_scales".
```

### Reproduction

```
from tensorrt_llm import LLM, SamplingParams

def main():
    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

    llm = LLM(model="nvidia/Qwen2.5-VL-7B-Instruct-FP8", tensor_parallel_size=1)
    outputs = llm.generate(prompts, sampling_params)

    for output in outputs:
        print(f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}")

if __name__ == "__main__":
    main()
```

### Expected behavior

The model should load successfully and perform text generation like the other supported original models (Qwen/Qwen2.5-VL-7B-Instruct).

### actual behavior

When I ran the official FP8 example for Qwen2.5-VL, the model failed to load successfully.
The script raised a missing keys and unexpected keys error during weight loading.

### additional notes

It seems the FP8 checkpoint on HuggingFace may not include the quantization scale tensors (e.g., weight_scale, input_scale, kv_scales) expected by the TensorRT-LLM Qwen2VL model definition.

Could you please confirm:
	•	Whether the FP8 vision weights for Qwen2.5-VL are fully compatible with tensorrt_llm>=1.2.0rc0?
	•	Or if there’s an updated checkpoint or branch that supports this model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Missing keys when loading Qwen2.5-VL-FP8 weights in TensorRT-LLM #8569

System Info

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Missing keys when loading Qwen2.5-VL-FP8 weights in TensorRT-LLM #8569

Description

System Info

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions