Skip to content

[Feature]: support for nvidia/Qwen2.5-VL-7B-Instruct-FP4 #8404

@pythonjavaerlang

Description

@pythonjavaerlang

🚀 The feature, motivation and pitch

Nvidia L4 (25 GB)
https://huggingface.co/nvidia/Qwen2.5-VL-7B-Instruct-FP4
nvidia-smi
| NVIDIA-SMI 550.163.01 Driver Version: 535.261.03 CUDA Version: 13.0 |

python3 ./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py --model_dir ./Qwen2.5-VL-7B-Instruct-FP4 --output_dir ./tllm_checkpoint_fp4 --dtype float16 --smoothquant 0.5

[TensorRT-LLM] TensorRT LLM version: 1.2.0rc1
1.2.0rc1
Traceback (most recent call last):
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 345, in
main()
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 337, in main
convert_and_save_hf(args)
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 263, in convert_and_save_hf
QWenForCausalLM.quantize(
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/qwen/model.py", line 530, in quantize
config = QWenConfig.from_hugging_face(hf_model_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/qwen/config.py", line 115, in from_hugging_face
assert qwen_type in valid_types, f"Unsupported Qwen type: {qwen_type}, only {valid_types} are acceptable."
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Unsupported Qwen type: qwen2_5_vl, only ('qwen', 'qwen2', 'qwen2_moe', 'qwen2_llava_onevision', 'qwen2_vl', 'qwen2_audio', 'qwen3', 'qwen3_moe') are acceptable.

uname -a
Linux f909e01d9262 6.12.38+deb13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.38-1 (2025-07-16) x86_64 x86_64 x86_64 GNU/Linux

Alternatives

When I edit code adding model name manually, I receive this error: [TensorRT-LLM] TensorRT LLM version: 1.2.0rc1
1.2.0rc1
torch_dtype is deprecated! Use dtype instead!
Traceback (most recent call last):
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 345, in
main()
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 337, in main
convert_and_save_hf(args)
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 263, in convert_and_save_hf
QWenForCausalLM.quantize(
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/qwen/model.py", line 535, in quantize
convert.quantize(hf_model_dir,
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/qwen/convert.py", line 996, in quantize
hf_model = model_cls.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/auto_factory.py", line 607, in from_pretrained
raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.qwen2_5_vl.configuration_qwen2_5_vl.Qwen2_5_VLConfig'> for this kind of AutoModel: Au
toModelForCausalLM.
Model type should be one of ApertusConfig, ArceeConfig, AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegas
usConfig, BioGptConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2
Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DeepseekV2Config, DeepseekV3Config, DiffLlamaConfig, DogeConfig, Dots1Config, ElectraConfig,
Emu3Config, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2C
onfig, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nTextConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, GotOcr2Config, GPT2Config, GPT2Confi
g, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, Gr
aniteMoeSharedConfig, HeliumConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, JambaConfig, JetMoeConfig, Lfm2Config, LlamaConfig, Llama4Config, Llama4TextConf
ig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MiniMaxConfig, MistralConfig, MixtralConfig, MllamaConfig, ModernBer
tDecoderConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, OlmoeConfig, OpenLlamaConfig
, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBe
rtConfig, Qwen2Config, Qwen2MoeConfig, Qwen3Config, Qwen3MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormCon
fig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeedOssConfig, SmolLM3Config, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConf
ig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, ZambaConfig, Zamba2
Config.

Additional context

The following document says Qwen2.5 is supported:
https://nvidia.github.io/TensorRT-LLM/reference/support-matrix.html#models-pytorch-backend
"""
Qwen2_5_VLForConditionalGeneration | Qwen2.5-VL | Qwen/Qwen2.5-VL-7B-Instruct
"""

Docker image:
nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0.post1

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Labels

MultimodalLabel for issues & PRs regarding Multimodal related objectsquestionFurther information is requestedwaiting for feedback

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions