modelopt2trt model inference time is much slower than trt implicit quantization inference? #4365

666DZY666 · 2025-02-24T08:24:58Z

modelopt export quant onnx to trt with explicit quantization is much slower than using trt implicit quantization directly.
Is this a known problem?
Is there a solution?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modelopt2trt model inference time is much slower than trt implicit quantization inference? #4365

modelopt2trt model inference time is much slower than trt implicit quantization inference? #4365

666DZY666 commented Feb 24, 2025

modelopt2trt model inference time is much slower than trt implicit quantization inference? #4365

modelopt2trt model inference time is much slower than trt implicit quantization inference? #4365

Comments

666DZY666 commented Feb 24, 2025