Hi! I noticed that in internvl_lmdeploy.py, you defined a helper function process_pil_image to preprocess image inputs for InternVL models, but it doesn’t seem to be used anywhere in the current inference pipeline.
Could you clarify whether this preprocessing step is required for correct inference (or for matching the model’s expected input format), and if so, where it should be applied?
Also, could you share the decoding settings used in your experiments, specifically max_new_tokens, top_p, and temperature?
Thanks in advance!
Hi! I noticed that in internvl_lmdeploy.py, you defined a helper function
process_pil_imageto preprocess image inputs for InternVL models, but it doesn’t seem to be used anywhere in the current inference pipeline.Could you clarify whether this preprocessing step is required for correct inference (or for matching the model’s expected input format), and if so, where it should be applied?
Also, could you share the decoding settings used in your experiments, specifically max_new_tokens, top_p, and temperature?
Thanks in advance!