Skip to content

Fix MinicpmV model converter and clip to avoid using hardcode. #14750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gryffindor-rr
Copy link

Make sure to read the contributing guidelines before submitting a PR
Tested by llama-mtmd-cli with multiple minicpmv models.

@github-actions github-actions bot added examples python python script changes labels Jul 18, 2025
@ngxson ngxson self-requested a review July 18, 2025 09:22
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create PR from a non-master branch next time, otherwise I cannot push my corrections directly to this PR

Comment on lines +47 to +48
#define KEY_MINICPMV_QUERY_NUM "clip.minicpmv_query_num"
#define KEY_MINICPMV_PROJECTION_DIM "clip.minicpmv_projection_dim"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define KEY_MINICPMV_QUERY_NUM "clip.minicpmv_query_num"
#define KEY_MINICPMV_PROJECTION_DIM "clip.minicpmv_projection_dim"
#define KEY_MINICPMV_QUERY_NUM "clip.minicpmv_query_num"
#define KEY_MINICPMV_PROJ_DIM "clip.minicpmv_projection_dim"

Comment on lines +858 to +864
if (ctx->model.hparams.minicpmv_version == 2) {
num_query = 96;
} else if (ctx->model.hparams.minicpmv_version == 3) {
num_query = 64;
} else if (ctx->model.hparams.minicpmv_version == 4) {
num_query = 64;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this into clip_model_loader and populate ctx->model.hparams.minicpmv_query_num when loading the model

@@ -2110,6 +2118,8 @@ struct clip_model_loader {
get_u32(KEY_PATCH_SIZE, hparams.patch_size);
get_u32(KEY_IMAGE_CROP_RESOLUTION, hparams.image_crop_resolution, false);
get_i32(KEY_MINICPMV_VERSION, hparams.minicpmv_version, false); // legacy
get_u32(KEY_MINICPMV_QUERY_NUM, hparams.minicpmv_query_num, false);
get_u32(KEY_MINICPMV_PROJECTION_DIM, hparams.minicpmv_projection_dim, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_u32(KEY_MINICPMV_PROJECTION_DIM, hparams.minicpmv_projection_dim, false);
get_u32(KEY_MINICPMV_PROJ_DIM, hparams.minicpmv_projection_dim, false);

return 3584;
// Use actual config value if available, otherwise fall back to hardcoded values
if (hparams.minicpmv_projection_dim > 0) {
return hparams.minicpmv_projection_dim;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this doesn't need to be a metadata, you can read it from tensor shape, just like other models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants