-
Notifications
You must be signed in to change notification settings - Fork 141
feat:Add support of MiniCPM-o-2_6 model #495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces a complete MiniCPMO multimodal model implementation with vision-language capabilities. It adds configuration structures, image preprocessing pipelines, tokenization with image/audio support, example applications, and integrates SigLIP vision encoding with Qwen2VL-based language modeling for causal LM inference. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Example App
participant Tok as MiniCPMOTokenizer
participant Prep as MiniCPMOImageProcessor
participant Model as MiniCPMOForCausalLM
participant VPM as SiglipVisionModel
participant Resampler
participant LLM as Qwen2VLForCausalLM
App->>Tok: convertMessage(MiniCPMOMessage)
Tok->>Prep: process(image_path)
Prep-->>Tok: pixel_values, tgt_sizes, image_bounds
Tok-->>App: input_ids, pixel_values, tgt_sizes, image_bounds
App->>Model: forward(input_ids, pixel_values, tgt_sizes, image_bounds)
rect rgb(200, 220, 255)
Note over Model: Vision Processing
Model->>VPM: forward(pixel_values, tgt_sizes)
VPM-->>Model: vision_embeddings
Model->>Resampler: forward(vision_embeddings)
Resampler-->>Model: resampled_embeddings
end
rect rgb(220, 200, 255)
Note over Model: Text-Vision Fusion
Model->>Model: merge_vision_text_embeddings(text_emb, vision_emb, image_bounds)
Model-->>Model: fused_embeddings
end
rect rgb(200, 255, 220)
Note over Model: Language Modeling
Model->>LLM: forward(fused_embeddings, position_ids)
LLM->>LLM: decoder stack with RoPE
LLM-->>Model: output_logits
end
Model-->>App: sequence_output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
代码中没有cnpy和z的使用,需要移除这些额外依赖。并且避免使用$ENV{HOME}作为依赖目录
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
移除额外依赖,整理demo代码
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
mllm/backends/cpu/kernels/Kernels.hpp (1)
30-34: Remove duplicate include.The header
mllm/backends/cpu/kernels/arm/conv2d.hppis included twice (lines 30 and 34), which is redundant.Apply this diff to remove the duplicate:
#include "mllm/backends/cpu/kernels/arm/gelu.hpp" // IWYU pragma: export -#include "mllm/backends/cpu/kernels/arm/conv2d.hpp" // IWYU pragma: export #include "mllm/backends/cpu/kernels/arm/conv3d.hpp" // IWYU pragma: exportmllm/nn/Nn.hpp (1)
20-28: Remove duplicate include.The header
mllm/nn/layers/Conv2D.hppis included on both line 20 and line 28.Apply this diff to remove the duplicate:
#include "mllm/nn/layers/KVCache.hpp" // IWYU pragma: export #include "mllm/nn/layers/Conv1D.hpp" // IWYU pragma: export -#include "mllm/nn/layers/Conv2D.hpp" // IWYU pragma: export #include "mllm/nn/layers/STFT.hpp" // IWYU pragma: exportBased on static analysis hints.
♻️ Duplicate comments (1)
examples/minicpm_o/CMakeLists.txt (1)
2-3: Address past review feedback: Remove $ENV{HOME} dependency directories.As previously noted by oreomaker, using
$ENV{HOME}for dependency directories creates a non-portable build configuration that assumes a specific user environment setup. This should use proper CMake find_package mechanisms or project-relative paths.Consider using CMake's standard dependency resolution:
-include_directories($ENV{HOME}/local/include) -link_directories($ENV{HOME}/local/lib) +# Use find_package or target-specific include directories instead
🧹 Nitpick comments (13)
mllm/models/minicpm_o2_6/configuration_minicpmo.hpp (2)
10-10: Consider public inheritance for ConfigFile.The protected inheritance from
ConfigFilerestricts access to its public interface. If downstream code needs to access ConfigFile methods through MiniCPMOConfig, consider using public inheritance instead.
13-54: Consider using member initializer list for all fields.For better performance and clarity, primitive-type fields (especially floating-point values like
rms_norm_eps,audio_chunk_length,rope_theta, andtie_word_embeddings) should be initialized in the member initializer list rather than in the constructor body.Example refactor:
- explicit MiniCPMOConfig(const std::string& file_path) : ConfigFile(file_path) { + explicit MiniCPMOConfig(const std::string& file_path) : + ConfigFile(file_path), + rms_norm_eps(data()["rms_norm_eps"]), + audio_chunk_length(data()["audio_chunk_length"]), + rope_theta(data()["rope_theta"]), + tie_word_embeddings(data()["tie_word_embeddings"]) { // Vision Config vision_hidden_size = data()["vision_hidden_size"]; // ... other fields ... - rms_norm_eps = data()["rms_norm_eps"]; // ... etcBased on coding guidelines.
mllm/models/minicpm_o2_6/modeling_siglip.hpp (2)
391-395: Simplify redundant conditional branches.The if-else branches on lines 391-394 both pass the same arguments to
embeddings_whentgt_sizesis nil. The explicitTensor::nil()in line 392 is redundant sincetgt_sizesis already nil.Apply this diff to simplify:
- std::vector<Tensor> hidden_states_result; - if (tgt_sizes.isNil()) { - hidden_states_result = embeddings_(pixel_values, Tensor::nil(), patch_attention_mask); - } else { - hidden_states_result = embeddings_(pixel_values, tgt_sizes, patch_attention_mask); - } + auto hidden_states_result = embeddings_(pixel_values, tgt_sizes, patch_attention_mask);Based on static analysis hints.
406-414: Potential infinite loop with nested conditions.The nested loop structure with
all_validflag (lines 407-414) has both loops conditioned onall_valid, which means the outer loop will break as soon as any invalid mask is found. However, the logic would be clearer with an explicit break or by extracting this to a helper function.Consider refactoring for clarity:
// Helper function bool all_patches_valid(const Tensor& patch_attention_mask, int batch_size, int max_patches) { for (int i = 0; i < batch_size; i++) { for (int j = 0; j < max_patches; j++) { if (patch_attention_mask.at<uint8_t>({i, j}) == 0) { return false; } } } return true; } // In forward(): if (!all_patches_valid(patch_attention_mask, batch_size, max_patches)) { // Create attention mask... }mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp (1)
202-249: Optimize string concatenation in loops.Multiple functions build strings through repeated concatenation in loops (
get_slice_image_placeholder,get_grid_placeholder). For better performance, consider reserving capacity upfront.Example for
get_slice_image_placeholder:std::string get_slice_image_placeholder(std::pair<int, int> image_size, const std::vector<int> &grid, int image_idx = 0, bool use_image_id = true) { std::string image_placeholder; image_placeholder.reserve(8 + image_feature_size * 5 + 9); // Approximate size image_placeholder = "<image>"; for(int i=0; i < image_feature_size; i++){ image_placeholder += "<unk>"; } image_placeholder += "</image>"; // ... rest of function }examples/minicpm_o/mainllm.cpp (1)
78-85: Remove or make the debug token limit configurable.The hardcoded 50-token limit is intended for debugging but restricts the model's ability to generate complete responses. Consider removing it or adding a command-line argument to control the maximum output length.
Apply this diff to remove the debug limit:
int token_count = 0; for(auto& step : minicpmo.chat(inputs)){ auto token_str = minicpmo_tokenizer.detokenize(step.cur_token_id); std::wcout<< token_str << std::flush; token_count++; - if(token_count >= 50) break; // Limit output for debugging }Or add a configurable limit via command-line argument.
examples/minicpm_o/main.cpp (2)
65-149: Remove commented-out code.This large block of commented-out code (85 lines) adds clutter and should be removed. If this interactive CLI functionality is planned for future implementation, track it in an issue instead of leaving dead code in the repository.
Consider removing this entire commented section or opening an issue to track the interactive CLI feature implementation.
5-5: Remove unused commented include.If
modeling_minicpmo.hppis not needed for this example, remove the commented include line to keep the code clean.mllm/models/minicpm_o2_6/modeling_resampler.hpp (1)
99-203: Consider using framework utilities for tensor operations.The forward method contains many nested loops for manual tensor operations (weight slicing, bias addition, reshaping). If the framework provides built-in utilities for these operations (e.g.,
tensor.slice(),tensor.reshape(),tensor.view()), using them could improve readability and potentially performance.mllm/models/minicpm_o2_6/modeling_minicpmo.hpp (2)
22-74: Consider consolidating duplicate projection layer implementations.
AudioProjectionLayerandTTSProjectorhave identical implementations (linear → ReLU → linear). Since these components are currently commented out in the main model, consider creating a single reusableProjectionLayerclass when these features are activated.
295-297: Remove debug print statements.These debug prints add noise to production logs.
Apply this diff:
Tensor merge_vision_text_embeddings(Tensor& text_embeddings, Tensor& vision_embeddings, Tensor& image_bounds) { - - mllm::print(text_embeddings.shape()); - mllm::print(vision_embeddings.shape()); - mllm::print(image_bounds); auto batch_size = text_embeddings.shape()[0]; //text_embeddings: [1, seq_len, embed_dim] ... } } - - mllm::print("finished merging!"); - // Debug: Load and replace with Python-saved embeddingsAlso applies to: 320-320
mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp (2)
418-425: Improve variable naming consistency.The variable
HW_patch_sizeuses mixed case which is inconsistent with typical C++ naming conventions. Consider renaming tomax_hw_patch_sizefor clarity.int channels = img_tensors[0].shape()[0]; int patch_size = img_tensors[0].shape()[1]; - int HW_patch_size = img_tensors[0].shape()[2]; - for(int i=0;i<img_tensors.size();i++){ - if(img_tensors[i].shape()[2] > HW_patch_size){ - HW_patch_size = img_tensors[i].shape()[2]; + int max_hw_patch_size = img_tensors[0].shape()[2]; + for(int i = 0; i < img_tensors.size(); i++){ + if(img_tensors[i].shape()[2] > max_hw_patch_size){ + max_hw_patch_size = img_tensors[i].shape()[2]; } } - Tensor pixel_values = Tensor::empty({(int)img_tensors.size(),channels, patch_size, HW_patch_size}, kFloat32, kCPU) + Tensor pixel_values = Tensor::empty({(int)img_tensors.size(), channels, patch_size, max_hw_patch_size}, kFloat32, kCPU)
197-199: Clarify unusedpatch_sizeparameter.The constructor accepts a
patch_sizeparameter (default 14), but the image preprocessor initialization on line 198 is commented out. Either remove the unused parameter or document why it's reserved for future use.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
examples/minicpm_o/CMakeLists.txt(1 hunks)examples/minicpm_o/config_minicpm_o.json(1 hunks)examples/minicpm_o/main.cpp(1 hunks)examples/minicpm_o/mainllm.cpp(1 hunks)mllm/backends/cpu/kernels/Kernels.hpp(1 hunks)mllm/models/minicpm_o2_6/configuration_minicpmo.hpp(1 hunks)mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp(1 hunks)mllm/models/minicpm_o2_6/modeling_minicpmo.hpp(1 hunks)mllm/models/minicpm_o2_6/modeling_qwen2vl_for_minicpmo.hpp(1 hunks)mllm/models/minicpm_o2_6/modeling_resampler.hpp(1 hunks)mllm/models/minicpm_o2_6/modeling_siglip.hpp(1 hunks)mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp(1 hunks)mllm/nn/Nn.hpp(1 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
mllm/nn/Nn.hpp
[error] 28-28: duplicate include
(readability-duplicate-include,-warnings-as-errors)
examples/minicpm_o/mainllm.cpp
[error] 1-1: 'iostream' file not found
(clang-diagnostic-error)
[error] 16-16: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 16-16: invalid case style for variable 'MLLM_MAIN'
(readability-identifier-naming,-warnings-as-errors)
examples/minicpm_o/main.cpp
[error] 1-1: 'iostream' file not found
(clang-diagnostic-error)
[error] 15-15: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 15-15: invalid case style for variable 'MLLM_MAIN'
(readability-identifier-naming,-warnings-as-errors)
mllm/models/minicpm_o2_6/modeling_resampler.hpp
[error] 6-6: 'mllm/mllm.hpp' file not found
(clang-diagnostic-error)
[error] 23-23: 2 adjacent parameters of 'get2DSinCosPosEmbed' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 24-24: variable 'grid_h_size' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 25-25: variable 'grid_w_size' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 27-27: variable 'pos_embed' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 29-29: variable 'half_dim' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 55-55: constructor does not initialize these fields: embed_dim_, num_heads_, head_dim_, in_proj_weight_, in_proj_bias_, out_proj_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 66-66: constructor does not initialize these fields: embed_dim_, num_heads_, head_dim_, in_proj_weight_, in_proj_bias_, out_proj_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 66-66: 3 adjacent parameters of 'ResamplerAttention' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 78-78: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 116-116: variable name 'q' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 117-117: variable name 'k' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 118-118: variable name 'v' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 209-209: constructor does not initialize these fields: num_queries_, embed_dim_, num_heads_, kv_dim_, max_size_, query_, kv_proj_, ln_q_, ln_kv_, ln_post_, proj_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 227-227: constructor does not initialize these fields: num_queries_, embed_dim_, num_heads_, kv_dim_, max_size_, query_, kv_proj_, ln_q_, ln_kv_, ln_post_, proj_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 227-227: 6 adjacent parameters of 'Resampler' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 246-246: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 247-247: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 253-253: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
[error] 261-261: multiple declarations in a single statement reduces readability
(readability-isolate-declaration,-warnings-as-errors)
[error] 292-292: variable 'tgt_h' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 293-293: variable 'tgt_w' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 294-294: variable 'patch_count' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 296-296: variable 'pos_embed_i' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 312-312: variable 'pos_embed_padded' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 315-315: variable 'actual_len' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 334-334: variable name 'q' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp
[error] 5-5: 'cmath' file not found
(clang-diagnostic-error)
[error] 23-23: constructor does not initialize these fields: scale_resolution_, patch_size_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 27-27: 3 adjacent parameters of 'ImageSliceProcessor' of similar type ('int') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 27-27: 9 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 27-27: 448 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 27-27: 14 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 190-190: constructor does not initialize these fields: mean_, std_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 190-190: 3 adjacent parameters of 'MiniCPMOImageProcessor' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 190-190: 14 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 190-190: 980 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 191-191: 0.5 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 191-191: 0.5 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 191-191: 2 adjacent parameters of 'MiniCPMOImageProcessor' of similar type ('float') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 191-191: 0.5 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 192-192: 0.5 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 192-192: 0.5 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 192-192: 0.5 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 197-197: 9 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 197-197: 448 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 201-201: 3 adjacent parameters of 'get_slice_image_placeholder' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 202-202: variable 'image_placeholder' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 208-208: variable 'final_placeholder' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 209-209: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
[error] 223-223: variable 'slice_image_placeholder' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 229-229: variable 'cols' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 230-230: variable 'rows' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 234-234: variable 'line' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 241-241: variable 'result' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 376-376: 64 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
mllm/models/minicpm_o2_6/modeling_siglip.hpp
[error] 5-5: 'mllm/mllm.hpp' file not found
(clang-diagnostic-error)
[error] 16-16: constructor does not initialize these fields: embed_dim_, image_size_, patch_size_, num_patches_per_side_, num_patches_, patch_embedding_, position_embedding_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 29-29: constructor does not initialize these fields: embed_dim_, image_size_, patch_size_, num_patches_per_side_, num_patches_, patch_embedding_, position_embedding_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 29-29: 2 adjacent parameters of 'SiglipVisionEmbeddings' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 43-43: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 65-65: variable 'step' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 173-173: constructor does not initialize these fields: hidden_size_, intermediate_size_, fc1_, fc2_, activation_fn_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 184-184: constructor does not initialize these fields: hidden_size_, intermediate_size_, fc1_, fc2_, activation_fn_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 184-184: 2 adjacent parameters of 'SiglipMLP' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 193-193: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 196-196: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 205-205: constructor does not initialize these fields: embed_dim_, num_heads_, head_dim_, scale_, k_proj_, v_proj_, q_proj_, out_proj_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 219-219: constructor does not initialize these fields: embed_dim_, num_heads_, head_dim_, scale_, k_proj_, v_proj_, q_proj_, out_proj_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 219-219: 2 adjacent parameters of 'SiglipAttention' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 231-231: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 273-273: constructor does not initialize these fields: embed_dim_, layer_norm1_, layer_norm2_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 284-284: constructor does not initialize these fields: embed_dim_, layer_norm1_, layer_norm2_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 284-284: 2 adjacent parameters of 'SiglipEncoderLayer' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 293-293: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 314-314: constructor does not initialize these fields: num_layers_, layers_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 321-321: constructor does not initialize these fields: num_layers_, layers_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 321-321: 2 adjacent parameters of 'SiglipVisionEncoder' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 329-329: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 343-343: constructor does not initialize these fields: embed_dim_, patch_size_, post_layernorm_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 354-354: constructor does not initialize these fields: embed_dim_, patch_size_, post_layernorm_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 354-354: 2 adjacent parameters of 'SiglipVisionModel' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 363-363: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 391-391: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
[error] 401-401: variable 'attention_mask' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
mllm/models/minicpm_o2_6/configuration_minicpmo.hpp
[error] 5-5: 'mllm/core/aops/LinearOp.hpp' file not found
(clang-diagnostic-error)
[error] 30-30: 'rms_norm_eps' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 41-41: 'audio_chunk_length' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 50-50: 'rope_theta' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 51-51: 'tie_word_embeddings' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 72-72: 1e-06 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 93-93: 1000000.0 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
mllm/models/minicpm_o2_6/modeling_minicpmo.hpp
[error] 5-5: 'mllm/mllm.hpp' file not found
(clang-diagnostic-error)
[error] 23-23: constructor does not initialize these fields: linear1_, relu_, linear2_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 28-28: constructor does not initialize these fields: linear1_, relu_, linear2_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 28-28: 4 adjacent parameters of 'AudioProjectionLayer' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 35-35: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 36-36: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 50-50: constructor does not initialize these fields: linear1_, relu_, linear2_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 55-55: constructor does not initialize these fields: linear1_, relu_, linear2_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 55-55: 4 adjacent parameters of 'TTSProjector' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 62-62: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 63-63: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 79-79: constructor does not initialize these fields: config_, llm_, vpm_, resampler_, kv_cache_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 105-105: member variable 'config_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 106-106: member variable 'llm_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 107-107: member variable 'vpm_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 108-108: member variable 'resampler_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 113-113: member variable 'loadPythonEmbedding' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 118-118: redundant access specifier has the same accessibility as the previous access specifier
(readability-redundant-access-specifiers,-warnings-as-errors)
[error] 120-120: variable 'llm_config' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 137-137: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 140-140: variable 'input_ids' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 141-141: repeated branch in conditional chain
(bugprone-branch-clone,-warnings-as-errors)
[error] 150-150: variable 'pixel_values' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 151-151: variable 'tgt_sizes' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 152-152: variable 'image_bounds' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 153-153: variable 'audio_features' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 155-155: variable 'prev_position_ids' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 156-156: variable 'is_decode_stage' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 180-180: 10 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 181-181: 64 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 182-182: 3584 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 223-223: variable 'position_ids' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 237-237: loop variable name 'd' is too short, expected at least 2 characters
(readability-identifier-length,-warnings-as-errors)
[error] 260-260: variable 'result' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 293-293: 3 adjacent parameters of 'merge_vision_text_embeddings' of similar type ('int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 297-297: variable 'image_bounds' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 324-324: variable 'arr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 325-325: variable 'data_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 327-327: variable name 'tt' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 338-338: 3 adjacent parameters of 'merge_audio_text_embeddings' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
mllm/models/minicpm_o2_6/modeling_qwen2vl_for_minicpmo.hpp
[error] 4-4: 'array' file not found
(clang-diagnostic-error)
[error] 17-17: 2 adjacent parameters of 'makeVisualRoPEInvFreq' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 18-18: variable 'half_dim' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 19-19: variable 'inv_freq' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 20-20: variable 'inv_freq_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 21-21: variable 'dims_inv' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 23-23: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 23-23: narrowing conversion from 'int' to 'float'
(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions,-warnings-as-errors)
[error] 43-43: variable 'out' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 44-44: variable 'out_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 97-97: variable 'dim' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 98-98: variable 'freqs' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 99-99: variable 'inv_freq_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 100-100: variable 'freqs_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 103-103: variable 'row_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 115-115: variable 'sin_pos_emb' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 116-116: variable 'cos_pos_emb' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 131-131: 3 adjacent parameters of 'makeVisualRotaryPosEmb' of similar type ('int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 132-132: variable 'grid_dims' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 133-133: variable name 't' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 133-133: do not use pointer arithmetic
(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)
[error] 134-134: variable name 'h' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 134-134: do not use pointer arithmetic
(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)
[error] 135-135: variable name 'w' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 135-135: do not use pointer arithmetic
(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)
[error] 137-137: variable 'num_positions' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 138-138: variable 'dim' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 139-139: variable 'batch_size' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 140-140: variable 'seq_len' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 143-143: variable 'out' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 174-174: 2 adjacent parameters of 'makeMultimodalRoPEInvFreq' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 181-181: 4 adjacent parameters of 'makeMultimodalPositionEmbedding' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 189-189: variable 'tmp_sin' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 190-190: variable 'tmp_cos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 192-192: loop variable name 'b' is too short, expected at least 2 characters
(readability-identifier-length,-warnings-as-errors)
[error] 204-204: variable 'sin' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 205-205: variable 'cos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 212-212: variable 'num_rows' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 213-213: variable 'num_cols' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 246-246: variable 'num_rows' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 247-247: variable 'num_cols' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 271-271: constructor does not initialize these fields: in_chans_, embed_dim_, patch_size_, temporal_patch_size_, proj_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 282-282: constructor does not initialize these fields: in_chans_, embed_dim_, patch_size_, temporal_patch_size_, proj_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 282-282: 2 adjacent parameters of 'PatchEmbed' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 294-294: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 305-305: constructor does not initialize these fields: hidden_size_, spatial_merge_size_, context_dim_, ln_q_, mlp_0_, mlp_2_, mlp_gelu_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 318-318: constructor does not initialize these fields: hidden_size_, spatial_merge_size_, context_dim_, ln_q_, mlp_0_, mlp_2_, mlp_gelu_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 318-318: 2 adjacent parameters of 'PatchMerger' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 329-329: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 330-330: variable name 'o' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 338-338: constructor does not initialize these fields: dim_, hidden_dim_, act_, fc_1_, fc_2_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 349-349: constructor does not initialize these fields: dim_, hidden_dim_, act_, fc_1_, fc_2_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 349-349: 2 adjacent parameters of 'VisionMlp' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 358-358: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 363-363: constructor does not initialize these fields: dim_, num_heads_, head_dim_, qkv_, proj_, softmax_, vision_rope_q_, vision_rope_k_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 368-368: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 379-379: constructor does not initialize these fields: dim_, num_heads_, head_dim_, qkv_, proj_, softmax_, vision_rope_q_, vision_rope_k_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 379-379: 2 adjacent parameters of 'VisionAttention' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 403-403: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 442-442: constructor does not initialize these fields: mlp_hidden_dim_, norm1_, norm2_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 454-454: constructor does not initialize these fields: mlp_hidden_dim_, norm1_, norm2_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 454-454: 2 adjacent parameters of 'Qwen2VLVisionBlock' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 462-462: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 473-473: constructor does not initialize these fields: blocks_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 481-481: constructor does not initialize these fields: blocks_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 481-481: 2 adjacent parameters of 'Qwen2VisionTransformerPretrainedModel' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 487-487: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 502-502: constructor does not initialize these fields: gate_proj_, up_proj_, down_proj_, silu_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 510-510: constructor does not initialize these fields: gate_proj_, up_proj_, down_proj_, silu_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 510-510: 2 adjacent parameters of 'Qwen2VLMLP' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 517-517: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 518-518: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 520-520: variable name 'y' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 527-527: constructor does not initialize these fields: q_proj_, k_proj_, v_proj_, o_proj_, q_rope_, k_rope_, mask_, softmax_, hidden_size_, head_dim_, num_attention_heads_, num_key_value_heads_, num_key_value_groups_, layer_idx_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 546-546: constructor does not initialize these fields: q_proj_, k_proj_, v_proj_, o_proj_, q_rope_, k_rope_, mask_, softmax_, hidden_size_, num_attention_heads_, num_key_value_heads_, layer_idx_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 546-546: 2 adjacent parameters of 'Qwen2VLAttention' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 551-551: 'num_key_value_groups_' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 571-571: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 572-572: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 582-582: variable 'B' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 582-582: variable name 'B' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 582-582: invalid case style for variable 'B'
(readability-identifier-naming,-warnings-as-errors)
[error] 583-583: variable 'S' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 583-583: variable name 'S' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 583-583: invalid case style for variable 'S'
(readability-identifier-naming,-warnings-as-errors)
[error] 604-604: variable 'attn' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 605-605: repeated branch in conditional chain
(bugprone-branch-clone,-warnings-as-errors)
[error] 627-627: member variable 'layer_idx_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 630-630: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 639-639: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 639-639: 2 adjacent parameters of 'Qwen2VLDecoder' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 646-646: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 651-651: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 661-661: constructor does not initialize these fields: decode_blocks_, norm_, lm_head_, tie_word_embeddings_, embedding_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 670-670: constructor does not initialize these fields: decode_blocks_, norm_, lm_head_, tie_word_embeddings_, embedding_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 670-670: 2 adjacent parameters of 'Qwen2VLText' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 689-689: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 693-693: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 703-703: variable name 'S' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 703-703: invalid case style for variable 'S'
(readability-identifier-naming,-warnings-as-errors)
[error] 712-712: member variable 'embedding_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 717-717: constructor does not initialize these fields: kv_cache_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 731-731: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 759-759: variable 'vision_pad_token_start' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 762-762: variable name 'S' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 762-762: invalid case style for variable 'S'
(readability-identifier-naming,-warnings-as-errors)
[error] 773-773: variable name 'D' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 773-773: invalid case style for variable 'D'
(readability-identifier-naming,-warnings-as-errors)
[error] 804-804: 4 adjacent parameters of 'getPositionIds' of similar type ('int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 819-819: 2 adjacent parameters of 'getPositionIdsPrefill' of similar type ('int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 825-825: variable name 'B' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 825-825: invalid case style for variable 'B'
(readability-identifier-naming,-warnings-as-errors)
[error] 827-827: variable name 'S' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 827-827: invalid case style for variable 'S'
(readability-identifier-naming,-warnings-as-errors)
[error] 829-829: variable 'position_ids' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 834-834: variable 'vision_pad_token_start' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 847-847: multiple declarations in a single statement reduces readability
(readability-isolate-declaration,-warnings-as-errors)
[error] 847-847: variable 'img_t' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 847-847: variable 'img_h' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 847-847: variable 'img_w' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 848-848: multiple declarations in a single statement reduces readability
(readability-isolate-declaration,-warnings-as-errors)
[error] 848-848: variable 'inputs_t' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 848-848: variable 'inputs_h' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 848-848: variable 'inputs_w' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 861-861: variable 'current_max_position_id' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 864-864: variable 'start_idx' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 865-865: loop variable name 'd' is too short, expected at least 2 characters
(readability-identifier-length,-warnings-as-errors)
[error] 873-873: invalid case style for variable '_cnt'
(readability-identifier-naming,-warnings-as-errors)
[error] 874-874: variable 'vision_start_id' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 898-898: variable 'vision_token_count' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 899-899: variable 'trailing_text_start_seq' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 900-900: variable 'trailing_text_count' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 903-903: variable 'start_id' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 904-904: loop variable name 'd' is too short, expected at least 2 characters
(readability-identifier-length,-warnings-as-errors)
[error] 917-917: member variable 'cfg' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 918-918: member variable 'llm' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 919-919: member variable 'visual' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp
[error] 5-5: 'vector' file not found
(clang-diagnostic-error)
[error] 21-21: 2 adjacent parameters of 'miniCPMOTokenizerMatchPattern' of similar type ('int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 36-36: variable 'original_pos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 54-54: do not use 'else' after 'return'
(readability-else-after-return,-warnings-as-errors)
[error] 72-72: variable 'original_pos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 74-74: variable 'start' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 95-95: do not use 'else' after 'return'
(readability-else-after-return,-warnings-as-errors)
[error] 103-103: variable 'start' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 109-109: do not use 'else' after 'return'
(readability-else-after-return,-warnings-as-errors)
[error] 116-116: variable 'start' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 122-122: do not use 'else' after 'return'
(readability-else-after-return,-warnings-as-errors)
[error] 129-129: variable 'start' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 140-140: variable 'pos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 142-142: variable 'matched' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 143-143: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
[error] 152-152: constructor does not initialize these fields: prompt, img_file_path, audio_file_path
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 160-160: variable 'result' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 187-187: constructor does not initialize these fields: prompt
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 197-197: constructor does not initialize these fields: image_preprocessor_, bpe_, bytes_2_unicode_dict_, bytes_2_unicode_dict_inverse_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 197-197: 2 adjacent parameters of 'MiniCPMOTokenizer' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 197-197: 14 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 201-201: variable 'bytes_2_unicode_dict_' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 201-201: invalid case style for variable 'bytes_2_unicode_dict_'
(readability-identifier-naming,-warnings-as-errors)
[error] 282-282: variable 'utf_8_str' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 291-291: variable 'ret' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 313-313: variable 'end' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 320-320: variable 'pos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 329-329: variable 'final_text' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 368-368: variable 'sequence' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 376-376: variable 'result' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 392-392: 2 adjacent parameters of 'Convert2Tensors' of similar type ('int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 394-394: all parameters should be named in a function
(readability-named-parameter,-warnings-as-errors)
[error] 397-397: variable 'result' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 401-401: variable 'input_ids_tensor' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 418-418: variable 'channels' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 419-419: variable 'patch_size' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 420-420: variable 'HW_patch_size' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 420-420: invalid case style for variable 'HW_patch_size'
(readability-identifier-naming,-warnings-as-errors)
[error] 426-426: variable 'pixel_values' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 456-456: variable 'tgt_sizes_tensor' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 470-470: variable 'image_bounds_tensor' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 485-485: redundant access specifier has the same accessibility as the previous access specifier
(readability-redundant-access-specifiers,-warnings-as-errors)
🔇 Additional comments (7)
examples/minicpm_o/config_minicpm_o.json (1)
1-37: LGTM!The configuration file is comprehensive and well-structured, covering all necessary hyperparameters for the MiniCPMO multimodal model across vision, LLM, resampler, audio, TTS, and common runtime settings.
mllm/models/minicpm_o2_6/modeling_resampler.hpp (2)
23-53: LGTM!The 2D sinusoidal positional embedding implementation correctly handles both square (single dimension) and rectangular (two dimensions) grid sizes, with proper sin/cos encoding for width and height dimensions.
278-286: LGTM!The key padding mask initialization is correct: it initializes all positions to 1 (masked/padding), then sets valid positions to 0 (not masked). This aligns with the usage in ResamplerAttention where mask value of 1 results in -infinity attention weights.
mllm/models/minicpm_o2_6/modeling_qwen2vl_for_minicpmo.hpp (4)
149-159: Good defensive programming with bounds checking.The explicit validation of tensor dimensions and position indices helps prevent out-of-bounds access. This is especially important for vision processing where dimension mismatches could occur.
243-265: Verify simplified position embedding fallback.When
mrope_sectionis empty, the code uses only the first dimension (dimension 0) of the 3Dtmp_sin/tmp_costensors. The comment acknowledges this is "simplified." Ensure this fallback behavior is correct for all expected use cases wheremrope_sectionis empty.
759-776: LGTM! Vision embedding insertion is well-implemented.The code properly:
- Searches for the vision pad token position
- Asserts that the token was found (preventing silent failures)
- Safely copies visual embeddings into the input embeddings at the correct position
831-844: Single-image limitation is documented.The comment on line 833 clearly states "Only one image is supported." This is an acceptable limitation for an initial implementation, and the assertion on line 843 will catch violations.
examples/minicpm_o/CMakeLists.txt
Outdated
| link_directories($ENV{HOME}/local/lib) | ||
|
|
||
| add_executable(main_minicpm_o main.cpp) | ||
| target_link_libraries(main_minicpm_o PRIVATE MllmRT MllmCPUBackend cnpy z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify necessity of cnpy and z library dependencies.
The past review noted that cnpy and z are not used in the code. While cnpy.h is included in mainllm.cpp (line 12), it's unclear if these libraries are actually required. If they're not used, they should be removed to avoid unnecessary dependencies.
Run the following script to check for actual usage of cnpy library functions:
If no results are found, remove the cnpy and z dependencies:
-target_link_libraries(main_minicpm_o PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_o PRIVATE MllmRT MllmCPUBackend)
-target_link_libraries(main_minicpm_o2 PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_o2 PRIVATE MllmRT MllmCPUBackend)
-target_link_libraries(main_minicpm_dbg PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_dbg PRIVATE MllmRT MllmCPUBackend)And remove the unused include from mainllm.cpp.
Also applies to: 10-10, 14-14
🏁 Script executed:
#!/bin/bash
# Search for cnpy namespace or function calls in the example files
rg -n --type=cpp 'cnpy::|npy_save|npy_load|npz_save|npz_load' examples/minicpm_o/Length of output: 0
I need to verify this more thoroughly. Let me check for the cnpy.h include and examine the source files more carefully.
---
🏁 Script executed:
```shell
#!/bin/bash
# Check for cnpy.h include statements
echo "=== Searching for cnpy.h includes ==="
rg -n 'cnpy\.h|#include.*cnpy' examples/minicpm_o/ --type=cpp --type=h
echo ""
echo "=== Searching for 'cnpy' anywhere in source files ==="
rg -n 'cnpy' examples/minicpm_o/ --type=cpp --type=h
echo ""
echo "=== Searching for 'npy' usage ==="
rg -n '\bnpy[A-Za-z_]*' examples/minicpm_o/ --type=cpp --type=h
echo ""
echo "=== Listing all .cpp and .h files in examples/minicpm_o ==="
fd -e cpp -e h examples/minicpm_o/
Length of output: 674
Good, cnpy is only included but never used. Let me verify the context and check for z library usage:
---
🏁 Script executed:
```shell
#!/bin/bash
# Show context around cnpy.h include in mainllm.cpp
echo "=== Context around cnpy.h include ==="
rg -n -B 5 -A 10 'cnpy\.h' examples/minicpm_o/mainllm.cpp
echo ""
echo "=== Searching for z library usage (zlib functions) ==="
rg -n '\b(compress|uncompress|deflate|inflate|gzip|ungzip|gz_)' examples/minicpm_o/ --type=cpp --type=h
echo ""
echo "=== List all files in examples/minicpm_o ==="
fd . examples/minicpm_o/
Length of output: 962
Let me do one final check on main.cpp to be absolutely certain:
---
🏁 Script executed:
```shell
#!/bin/bash
# Check main.cpp for any cnpy or z usage
echo "=== Checking main.cpp for cnpy or z usage ==="
rg -n 'cnpy|compress|uncompress|deflate|inflate|gzip' examples/minicpm_o/main.cpp
echo "Status: $?"
Length of output: 55
Remove unused cnpy and z library dependencies.
Verification confirms that cnpy and z are completely unused in the codebase. While cnpy.h is included in mainllm.cpp (line 12), no cnpy functions are called anywhere, and no zlib compression functions are used. These should be removed:
-target_link_libraries(main_minicpm_o PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_o PRIVATE MllmRT MllmCPUBackend)
-target_link_libraries(main_minicpm_o2 PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_o2 PRIVATE MllmRT MllmCPUBackend)
-target_link_libraries(main_minicpm_dbg PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_dbg PRIVATE MllmRT MllmCPUBackend)Also remove the unused include from mainllm.cpp line 12.
Also applies to: 10-10, 14-14
🤖 Prompt for AI Agents
In examples/minicpm_o/CMakeLists.txt around line 6, remove the unused cnpy and z
libraries from the target_link_libraries call so only required libraries (MllmRT
MllmCPUBackend) remain; also edit mainllm.cpp (remove the unused #include
"cnpy.h" at line 12) and apply the same removals for the other affected CMake
lines referenced (lines 10 and 14) to eliminate the unused cnpy and z
dependencies and their unused include.
examples/minicpm_o/main.cpp
Outdated
| mllm::models::minicpmo::MiniCPMOMessage message{ | ||
| .prompt = "Inctroduce your self" | ||
| }; | ||
| message.img_file_path = "/Users/kkkai/Desktop/pics.jpg"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo and remove hardcoded test values.
Two issues here:
- Typo: "Inctroduce" should be "Introduce"
- Hardcoded path: The image path
/Users/kkkai/Desktop/pics.jpgis a developer-specific test value that will cause runtime failures for other users
Apply this diff:
mllm::models::minicpmo::MiniCPMOMessage message{
- .prompt = "Inctroduce your self"
+ .prompt = "Introduce yourself"
};
- message.img_file_path = "/Users/kkkai/Desktop/pics.jpg";
+ // TODO: Add command-line argument for image path or remove this test code📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| mllm::models::minicpmo::MiniCPMOMessage message{ | |
| .prompt = "Inctroduce your self" | |
| }; | |
| message.img_file_path = "/Users/kkkai/Desktop/pics.jpg"; | |
| mllm::models::minicpmo::MiniCPMOMessage message{ | |
| .prompt = "Introduce yourself" | |
| }; | |
| // TODO: Add command-line argument for image path or remove this test code |
🤖 Prompt for AI Agents
In examples/minicpm_o/main.cpp around lines 47 to 50, fix the misspelled prompt
and remove the developer-specific hardcoded image path: change "Inctroduce your
self" to "Introduce yourself" and replace message.img_file_path =
"/Users/kkkai/Desktop/pics.jpg" with a configurable value (e.g., read from a
command-line argument, environment variable, or make it optional/empty) so the
example does not rely on a local developer filesystem path.
examples/minicpm_o/mainllm.cpp
Outdated
| std::string image_path = "/Users/kkkai/Desktop/pics.jpg"; | ||
| std::string prompt_text = "描述图片中物体"; | ||
| mllm::models::minicpmo::MiniCPMOMessage message; | ||
| message.prompt = prompt_text; | ||
| message.img_file_path = image_path; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove hardcoded test values before merge.
The hardcoded image path (/Users/kkkai/Desktop/pics.jpg) and prompt text are developer-specific test values that should not be committed to the repository. This will cause runtime failures for other users.
Replace with command-line arguments or restore the commented interactive input code below:
- std::string image_path = "/Users/kkkai/Desktop/pics.jpg";
- std::string prompt_text = "描述图片中物体";
- mllm::models::minicpmo::MiniCPMOMessage message;
- message.prompt = prompt_text;
- message.img_file_path = image_path;
-
- // fmt::print("📷 Image path (or 'exit/quit'): ");
- // std::getline(std::cin, image_path);
- // if (image_path == "exit" || image_path == "quit") { return 0; }
- // fmt::print("💬 Prompt text: ");
- // std::getline(std::cin, prompt_text);
+ std::string image_path, prompt_text;
+ fmt::print("📷 Image path (or 'exit/quit'): ");
+ std::getline(std::cin, image_path);
+ if (image_path == "exit" || image_path == "quit") { return 0; }
+ fmt::print("💬 Prompt text: ");
+ std::getline(std::cin, prompt_text);
+
+ mllm::models::minicpmo::MiniCPMOMessage message;
+ message.prompt = prompt_text;
+ message.img_file_path = image_path;🤖 Prompt for AI Agents
In examples/minicpm_o/mainllm.cpp around lines 61 to 66, remove the hardcoded
image path and prompt text ("/Users/kkkai/Desktop/pics.jpg" and "描述图片中物体") and
replace them with runtime inputs: read values from command-line arguments (e.g.,
argc/argv) or re-enable the previously commented interactive input code so the
program fails gracefully for other users; ensure you validate the inputs
(existence/readable file) and provide a clear usage message when arguments are
missing.
| Image crop_image(Image& image, int x, int y, int crop_width, int crop_height) { | ||
| // Get source image properties | ||
| int src_width = image.w(); | ||
| int src_height = image.h(); | ||
| int src_channels = image.c(); | ||
|
|
||
| // Ensure crop bounds are valid | ||
| x = std::max(0, std::min(x, src_width - 1)); | ||
| y = std::max(0, std::min(y, src_height - 1)); | ||
| crop_width = std::min(crop_width, src_width - x); | ||
| crop_height = std::min(crop_height, src_height - y); | ||
|
|
||
| unsigned char* src_data = image.ptr(); | ||
| unsigned char* crop_data = new unsigned char[crop_width * crop_height * src_channels]; | ||
|
|
||
| // Copy pixel data row by row | ||
| for (int row = 0; row < crop_height; ++row) { | ||
| int src_row_offset = ((y + row) * src_width + x) * src_channels; | ||
| int crop_row_offset = row * crop_width * src_channels; | ||
| std::memcpy(crop_data + crop_row_offset, | ||
| src_data + src_row_offset, | ||
| crop_width * src_channels); | ||
| } | ||
|
|
||
| // Create a temporary file to save cropped data and load as Image | ||
| std::string temp_path = "/tmp/crop_" + std::to_string(rand()) + ".png"; | ||
| stbi_write_png(temp_path.c_str(), crop_width, crop_height, src_channels, crop_data, crop_width * src_channels); | ||
|
|
||
| Image cropped_image = Image::open(temp_path); | ||
|
|
||
| delete[] crop_data; | ||
| std::remove(temp_path.c_str()); | ||
|
|
||
| return cropped_image; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Insecure and non-portable temporary file handling.
The crop_image method has several serious issues:
- Non-portable path:
/tmp/is Unix-specific and doesn't work on Windows. - Predictable filenames: Using
rand()without seeding creates predictable, collision-prone filenames. - Resource leak risk: If
Image::open()throws, the temporary file is never deleted. - Security risk: Predictable temp filenames in shared directories can be exploited.
Consider using a proper temporary file mechanism:
Image crop_image(Image& image, int x, int y, int crop_width, int crop_height) {
// ... (existing bounds checking and crop logic) ...
// Use std::filesystem for portable temp directory
auto temp_dir = std::filesystem::temp_directory_path();
auto temp_file = temp_dir / ("crop_" + std::to_string(std::random_device{}()) + ".png");
stbi_write_png(temp_file.string().c_str(), crop_width, crop_height,
src_channels, crop_data, crop_width * src_channels);
// RAII for cleanup
struct TempFileGuard {
std::filesystem::path path;
~TempFileGuard() { std::filesystem::remove(path); }
} guard{temp_file};
Image cropped_image = Image::open(temp_file.string());
delete[] crop_data;
return cropped_image;
}Additionally, consider whether writing to disk is necessary at all—could you construct the Image directly from the cropped buffer?
🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp around lines
146-180, the function writes a predictable /tmp/ file using rand() and can leak
the temp file on exceptions; replace the hardcoded path and rand() with
std::filesystem::temp_directory_path() and a securely generated unique name
(e.g. std::random_device + std::chrono or std::filesystem::unique_path), write
the PNG to that path, and ensure deletion even if Image::open throws by using an
RAII guard (a small struct with path and destructor that removes the file) or
std::filesystem::remove in a try/finally equivalent; additionally, if Image can
be constructed directly from the in-memory buffer, avoid writing to disk
entirely and construct the Image from crop_data to remove temp file handling.
| std::tuple<std::vector<Tensor>, std::vector<std::pair<int, int>>, std::vector<std::pair<int, int>>, std::vector<int>> process(const std::string& image_path, int max_slice_nums = 9) { | ||
| auto img = Image::open(image_path); | ||
| std::pair<int, int> original_size = {img.w(), img.h()}; | ||
| auto [source_image, patches, grid] = image_slice_processor_.slice_image(img); | ||
| std::vector<Image> slice_images; | ||
| slice_images.push_back(source_image); | ||
|
|
||
| if (!patches.empty()) { | ||
| for (const auto& patch_row : patches) { | ||
| for (const auto& patch : patch_row) { | ||
| slice_images.push_back(patch); | ||
| } | ||
| } | ||
| } | ||
| std::vector<Tensor> processed_tensors; | ||
| std::vector<std::pair<int, int>> tgt_sizes; | ||
|
|
||
| for (auto& slice_img : slice_images) { | ||
|
|
||
| auto tensor = slice_img.tensor(); // [H, W, C] | ||
| tensor = tensor.permute({2, 0, 1}); // [C, H, W] | ||
| tensor = tensor * (1.f / 255.f); | ||
| normalize_tensor(tensor); | ||
| auto reshaped_tensor = reshape_by_patch(tensor); | ||
| processed_tensors.push_back(reshaped_tensor); | ||
|
|
||
| // Calculate target size (patch dimensions) | ||
| int patch_h = tensor.shape()[1] / patch_size_; | ||
| int patch_w = tensor.shape()[2] / patch_size_; | ||
| tgt_sizes.push_back({patch_h, patch_w}); | ||
| } | ||
| return std::make_tuple(processed_tensors, std::vector<std::pair<int, int>>{original_size}, tgt_sizes, grid); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for image loading.
The process method calls Image::open(image_path) without error handling. If the image file doesn't exist or is corrupted, this could cause a crash or undefined behavior.
Add validation:
std::tuple<std::vector<Tensor>, std::vector<std::pair<int, int>>, std::vector<std::pair<int, int>>, std::vector<int>> process(const std::string& image_path, int max_slice_nums = 9) {
auto img = Image::open(image_path);
+ if (img.empty() || img.w() == 0 || img.h() == 0) {
+ throw std::runtime_error("Failed to load image from: " + image_path);
+ }
std::pair<int, int> original_size = {img.w(), img.h()};Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp around lines
283-315, Image::open(image_path) is used without validation; wrap the call in a
try/catch (or check for a null/invalid Image if the API provides such a check)
and handle failures by logging a clear error with the path and either throwing a
std::runtime_error or returning an empty tuple consistent with callers; ensure
you do not dereference img when invalid and that the function returns the
correct types (empty vectors) or propagates the exception so callers can handle
the missing/corrupt image.
| Tensor x_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc(); | ||
| for(int i = 0; i < seq_len; i++){ | ||
| for(int j = 0; j < embed_dim_; j++){ | ||
| x_b.at<float>({i, j}) = x.at<float>({b, i, j}); | ||
| } | ||
| } | ||
|
|
||
| // pos_embed for this batch | ||
| Tensor pos_embed_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc(); | ||
| for(int i = 0; i < seq_len; i++){ | ||
| for(int j = 0; j < embed_dim_; j++){ | ||
| if(i < max_patch_len){ | ||
| pos_embed_b.at<float>({i, j}) = pos_embed_padded.at<float>({b, i, j}); | ||
| } else { | ||
| pos_embed_b.at<float>({i, j}) = 0.0f; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify tensor write operations use correct API.
The code uses x_b.at<float>({i, j}) = ... and pos_embed_b.at<float>({i, j}) = ... for writing tensor values (lines 343, 352, 354), while most other write operations throughout the file use *tensor.offsettedPtr<float>({...}) = .... Ensure that at<float>() returns a mutable reference suitable for writes, or use the consistent offsettedPtr pattern.
Verify the Tensor API usage:
If at<> is read-only, apply this diff:
for(int i = 0; i < seq_len; i++){
for(int j = 0; j < embed_dim_; j++){
- x_b.at<float>({i, j}) = x.at<float>({b, i, j});
+ *x_b.offsettedPtr<float>({i, j}) = x.at<float>({b, i, j});
}
}
// pos_embed for this batch
Tensor pos_embed_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc();
for(int i = 0; i < seq_len; i++){
for(int j = 0; j < embed_dim_; j++){
if(i < max_patch_len){
- pos_embed_b.at<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
+ *pos_embed_b.offsettedPtr<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
} else {
- pos_embed_b.at<float>({i, j}) = 0.0f;
+ *pos_embed_b.offsettedPtr<float>({i, j}) = 0.0f;
}
}
}🏁 Script executed:
#!/bin/bash
# Check the Tensor class definition for at<> method signature
rg -n "at<.*>\(" --type=hpp -A 2 | head -20Length of output: 27
🏁 Script executed:
#!/bin/bash
# Search for Tensor class definition with at<> method
rg -n "class Tensor" -A 30 | head -100Length of output: 5499
🏁 Script executed:
#!/bin/bash
# Search for at<> method definition
rg -n "\.at<" --type=cpp -B 2 -A 2Length of output: 32437
🏁 Script executed:
#!/bin/bash
# Search for offsettedPtr method definition
rg -n "offsettedPtr" --type=cpp -B 2 -A 2 | head -50Length of output: 4579
🏁 Script executed:
#!/bin/bash
# Look at the specific file to understand tensor API usage
rg -n "offsettedPtr.*float" mllm/models/minicpm_o2_6/modeling_resampler.hpp -B 1 -A 1Length of output: 3442
Replace tensor write operations with offsettedPtr pattern for consistency.
The code at lines 343, 352, and 354 uses at<float>() for tensor writes, which is inconsistent with the established pattern throughout the file and codebase. All other write operations in this file use *tensor.offsettedPtr<float>({...}) = value, while at<>() is reserved for reads. Apply the suggested diff to align with the correct API usage:
for(int i = 0; i < seq_len; i++){
for(int j = 0; j < embed_dim_; j++){
- x_b.at<float>({i, j}) = x.at<float>({b, i, j});
+ *x_b.offsettedPtr<float>({i, j}) = x.at<float>({b, i, j});
}
}
// pos_embed for this batch
Tensor pos_embed_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc();
for(int i = 0; i < seq_len; i++){
for(int j = 0; j < embed_dim_; j++){
if(i < max_patch_len){
- pos_embed_b.at<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
+ *pos_embed_b.offsettedPtr<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
} else {
- pos_embed_b.at<float>({i, j}) = 0.0f;
+ *pos_embed_b.offsettedPtr<float>({i, j}) = 0.0f;
}
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Tensor x_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc(); | |
| for(int i = 0; i < seq_len; i++){ | |
| for(int j = 0; j < embed_dim_; j++){ | |
| x_b.at<float>({i, j}) = x.at<float>({b, i, j}); | |
| } | |
| } | |
| // pos_embed for this batch | |
| Tensor pos_embed_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc(); | |
| for(int i = 0; i < seq_len; i++){ | |
| for(int j = 0; j < embed_dim_; j++){ | |
| if(i < max_patch_len){ | |
| pos_embed_b.at<float>({i, j}) = pos_embed_padded.at<float>({b, i, j}); | |
| } else { | |
| pos_embed_b.at<float>({i, j}) = 0.0f; | |
| } | |
| } | |
| } | |
| Tensor x_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc(); | |
| for(int i = 0; i < seq_len; i++){ | |
| for(int j = 0; j < embed_dim_; j++){ | |
| *x_b.offsettedPtr<float>({i, j}) = x.at<float>({b, i, j}); | |
| } | |
| } | |
| // pos_embed for this batch | |
| Tensor pos_embed_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc(); | |
| for(int i = 0; i < seq_len; i++){ | |
| for(int j = 0; j < embed_dim_; j++){ | |
| if(i < max_patch_len){ | |
| *pos_embed_b.offsettedPtr<float>({i, j}) = pos_embed_padded.at<float>({b, i, j}); | |
| } else { | |
| *pos_embed_b.offsettedPtr<float>({i, j}) = 0.0f; | |
| } | |
| } | |
| } |
| auto hidden_states = inputs_embeds; | ||
| for (auto& layer : layers_) { | ||
| hidden_states = layer(hidden_states, attention_mask)[0]; | ||
| //break; // For testing, run only one layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove commented-out debug code.
The commented //break; statement on line 336 appears to be leftover debug code and should be removed.
for (auto& layer : layers_) {
hidden_states = layer(hidden_states, attention_mask)[0];
- //break; // For testing, run only one layer
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| //break; // For testing, run only one layer | |
| for (auto& layer : layers_) { | |
| hidden_states = layer(hidden_states, attention_mask)[0]; | |
| } |
🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/modeling_siglip.hpp around line 336, remove the
leftover commented-out debug code "//break;" so the file contains no stray
commented debugging statements; simply delete that comment line and run a quick
build/format to ensure no trailing whitespace or style issues remain.
| for(int i=0;i<tgt_sizes.shape()[0];i++){ | ||
| if(tgt_sizes.at<int32_t>({i,0})>0 && tgt_sizes.at<int32_t>({i,1})>0){ | ||
| int patches = (tgt_sizes.at<int32_t>({i,0}) ) * (tgt_sizes.at<int32_t>({i,1}) ); | ||
| if (patches > max_patches) max_patches = patches; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add bounds checking for tgt_sizes access.
The loop accesses tgt_sizes.at<int32_t>({i, 0}) and tgt_sizes.at<int32_t>({i, 1}) without verifying that tgt_sizes has the expected shape. If tgt_sizes.shape()[1] is less than 2, this will cause out-of-bounds access.
Add shape validation:
int max_patches = 0;
// Calculate max_patches based on tgt_sizes
+ if (!tgt_sizes.isNil() && tgt_sizes.shape().size() >= 2 && tgt_sizes.shape()[1] >= 2) {
for(int i=0;i<tgt_sizes.shape()[0];i++){
if(tgt_sizes.at<int32_t>({i,0})>0 && tgt_sizes.at<int32_t>({i,1})>0){
int patches = (tgt_sizes.at<int32_t>({i,0}) ) * (tgt_sizes.at<int32_t>({i,1}) );
if (patches > max_patches) max_patches = patches;
}
}
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/modeling_siglip.hpp around lines 370-375, the loop
accesses tgt_sizes.at<int32_t>({i,0}) and {i,1} without checking that tgt_sizes
has at least two columns; add a guard to validate tgt_sizes.shape().size() > 1
and tgt_sizes.shape()[1] >= 2 before the loop (or check per-iteration) and skip
or handle rows that do not have two elements (e.g., continue, set patches=0, or
emit a clear error), ensuring you still bound-check i against
tgt_sizes.shape()[0] to avoid out-of-bounds access.
| std::string prompt; | ||
| std::string img_file_path; | ||
| std::string audio_file_path; | ||
| std::string system_prompt = "You are Qwen, created by Alibaba Clound. You are a helpful assistant."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix system prompt branding and typo.
The default system prompt incorrectly identifies the model as "Qwen, created by Alibaba Clound" (note the typo "Clound"). This should reflect the MiniCPMO model identity instead.
Consider updating to:
- std::string system_prompt = "You are Qwen, created by Alibaba Clound. You are a helpful assistant.";
+ std::string system_prompt = "You are MiniCPM-o, a helpful multimodal assistant.";📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::string system_prompt = "You are Qwen, created by Alibaba Clound. You are a helpful assistant."; | |
| std::string system_prompt = "You are MiniCPM-o, a helpful multimodal assistant."; |
🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp around line 156, the
system_prompt incorrectly brands the model as "Qwen, created by Alibaba Clound"
and contains a typo; update the prompt to identify the model as MiniCPMO and
remove the incorrect Alibaba/Qwen branding and typo — for example, set the
prompt text to a clear MiniCPMO identity like: You are MiniCPMO, a helpful
assistant. Ensure spelling ("Cloud") is not present and the prompt accurately
reflects the MiniCPMO model.
| if(hw > img_tensors[b].shape()[2]){ | ||
| pixel_values_ptr[dst_idx]=0; | ||
| } | ||
| else{ | ||
| pixel_values_ptr[dst_idx]=img_tensors[b].at<float>({c, p, hw}); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix off-by-one error in boundary check.
Line 439 uses > but should use >=. When hw equals img_tensors[b].shape()[2], the current condition is false, causing line 443 to access an out-of-bounds index.
Apply this diff:
- if(hw > img_tensors[b].shape()[2]){
+ if(hw >= img_tensors[b].shape()[2]){
pixel_values_ptr[dst_idx]=0;
}🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp around lines 439 to 444,
the boundary check for hw is off-by-one: change the condition from `>` to `>=`
so that when hw equals img_tensors[b].shape()[2] it treats it as out-of-bounds
and writes 0 instead of accessing img_tensors[b].at<float> out of range; update
the if condition accordingly and keep the else branch unchanged.
Removed multiple executable definitions and their associated libraries and include directories from CMakeLists.txt.
Summary by CodeRabbit