-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
🚀 The feature, motivation and pitch
Nvidia L4 (25 GB)
https://huggingface.co/nvidia/Qwen2.5-VL-7B-Instruct-FP4
nvidia-smi
| NVIDIA-SMI 550.163.01 Driver Version: 535.261.03 CUDA Version: 13.0 |
python3 ./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py --model_dir ./Qwen2.5-VL-7B-Instruct-FP4 --output_dir ./tllm_checkpoint_fp4 --dtype float16 --smoothquant 0.5
[TensorRT-LLM] TensorRT LLM version: 1.2.0rc1
1.2.0rc1
Traceback (most recent call last):
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 345, in
main()
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 337, in main
convert_and_save_hf(args)
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 263, in convert_and_save_hf
QWenForCausalLM.quantize(
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/qwen/model.py", line 530, in quantize
config = QWenConfig.from_hugging_face(hf_model_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/qwen/config.py", line 115, in from_hugging_face
assert qwen_type in valid_types, f"Unsupported Qwen type: {qwen_type}, only {valid_types} are acceptable."
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Unsupported Qwen type: qwen2_5_vl, only ('qwen', 'qwen2', 'qwen2_moe', 'qwen2_llava_onevision', 'qwen2_vl', 'qwen2_audio', 'qwen3', 'qwen3_moe') are acceptable.
uname -a
Linux f909e01d9262 6.12.38+deb13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.38-1 (2025-07-16) x86_64 x86_64 x86_64 GNU/Linux
Alternatives
When I edit code adding model name manually, I receive this error: [TensorRT-LLM] TensorRT LLM version: 1.2.0rc1
1.2.0rc1
torch_dtype is deprecated! Use dtype instead!
Traceback (most recent call last):
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 345, in
main()
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 337, in main
convert_and_save_hf(args)
File "/code/tensorrt_llm/./TensorRT-LLM/examples/models/core/qwen/convert_checkpoint.py", line 263, in convert_and_save_hf
QWenForCausalLM.quantize(
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/qwen/model.py", line 535, in quantize
convert.quantize(hf_model_dir,
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/qwen/convert.py", line 996, in quantize
hf_model = model_cls.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/auto_factory.py", line 607, in from_pretrained
raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.qwen2_5_vl.configuration_qwen2_5_vl.Qwen2_5_VLConfig'> for this kind of AutoModel: Au
toModelForCausalLM.
Model type should be one of ApertusConfig, ArceeConfig, AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegas
usConfig, BioGptConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2
Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DeepseekV2Config, DeepseekV3Config, DiffLlamaConfig, DogeConfig, Dots1Config, ElectraConfig,
Emu3Config, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2C
onfig, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nTextConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, GotOcr2Config, GPT2Config, GPT2Confi
g, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, Gr
aniteMoeSharedConfig, HeliumConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, JambaConfig, JetMoeConfig, Lfm2Config, LlamaConfig, Llama4Config, Llama4TextConf
ig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MiniMaxConfig, MistralConfig, MixtralConfig, MllamaConfig, ModernBer
tDecoderConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, OlmoeConfig, OpenLlamaConfig
, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBe
rtConfig, Qwen2Config, Qwen2MoeConfig, Qwen3Config, Qwen3MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormCon
fig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeedOssConfig, SmolLM3Config, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConf
ig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, ZambaConfig, Zamba2
Config.
Additional context
The following document says Qwen2.5 is supported:
https://nvidia.github.io/TensorRT-LLM/reference/support-matrix.html#models-pytorch-backend
"""
Qwen2_5_VLForConditionalGeneration | Qwen2.5-VL | Qwen/Qwen2.5-VL-7B-Instruct
"""
Docker image:
nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0.post1
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.