Skip to content

Conversation

@KKkai0315
Copy link

@KKkai0315 KKkai0315 commented Oct 29, 2025

Summary by CodeRabbit

  • New Features
    • Added MiniCPMO multimodal model support with vision, audio, and text processing capabilities.
    • Introduced multiple command-line example applications for model inference and tokenization.
    • Added image preprocessing and slicing functionality for multimodal inputs.
    • Enabled Qwen2VL vision-language integration for advanced visual understanding.
    • Implemented tokenizer supporting special tokens and multimodal message handling.

@coderabbitai
Copy link

coderabbitai bot commented Oct 29, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces a complete MiniCPMO multimodal model implementation with vision-language capabilities. It adds configuration structures, image preprocessing pipelines, tokenization with image/audio support, example applications, and integrates SigLIP vision encoding with Qwen2VL-based language modeling for causal LM inference.

Changes

Cohort / File(s) Change Summary
Build & Infrastructure
examples/minicpm_o/CMakeLists.txt, mllm/backends/cpu/kernels/Kernels.hpp, mllm/nn/Nn.hpp
Replaces single old executable target with four new targets (main_minicpm_o, main_minicpm_o2, main_minicpm_dbg, tokenizer_test). Adds global CMake directives (cmake_minimum_required, include_directories, link_directories) and per-target link/include configurations. Adds Conv2D includes to ARM kernel headers and Nn.hpp.
Configuration & Model Metadata
examples/minicpm_o/config_minicpm_o.json, mllm/models/minicpm_o2_6/configuration_minicpmo.hpp
Introduces MiniCPMOConfig struct to manage model hyperparameters across vision, LLM, resampler, audio, TTS, and common sections. Config file provides numerical values for model dimensions, token IDs, and implementation details.
Example Applications
examples/minicpm_o/main.cpp, examples/minicpm_o/mainllm.cpp
Replaces placeholder main with full CLI-driven applications. Implements argument parsing, model/tokenizer/config loading, sample message creation, tensor conversion, inference execution (SigLIP, resampler), and performance tracing for MiniCPMO.
Tokenization & Preprocessing
mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp, mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp
Implements MiniCPMOTokenizer with multimodal support (text+image/audio), custom regex-based segmentation, and BPE encoding. Adds ImageSliceProcessor and MiniCPMOImageProcessor for image slicing, patch extraction, normalization, and placeholder generation.
Vision Models
mllm/models/minicpm_o2_6/modeling_siglip.hpp
Introduces complete SigLIP vision module stack: embeddings, attention, MLP, encoder layers, and full vision transformer with dynamic position embedding, attention masking, and target-size-driven configuration.
Multimodal Model Core
mllm/models/minicpm_o2_6/modeling_resampler.hpp, mllm/models/minicpm_o2_6/modeling_minicpmo.hpp
Adds Resampler with multi-head attention and per-batch positional embeddings for vision-text fusion. Implements MiniCPMOForCausalLM integrating LLM, vision (SigLIP), and resampler with multimodal position embeddings and vision-text embedding merging. Includes AudioProjectionLayer and TTSProjector for audio pathways.
Qwen2VL Integration
mllm/models/minicpm_o2_6/modeling_qwen2vl_for_minicpmo.hpp
Comprehensive Qwen2VL visual-language model: patch embedding, attention blocks, vision transformer backbone, multimodal RoPE utilities, language model components (MLP, attention, decoder), and full causal LM integrating visual embeddings with LLM for end-to-end generation.

Sequence Diagram(s)

sequenceDiagram
    participant App as Example App
    participant Tok as MiniCPMOTokenizer
    participant Prep as MiniCPMOImageProcessor
    participant Model as MiniCPMOForCausalLM
    participant VPM as SiglipVisionModel
    participant Resampler
    participant LLM as Qwen2VLForCausalLM

    App->>Tok: convertMessage(MiniCPMOMessage)
    Tok->>Prep: process(image_path)
    Prep-->>Tok: pixel_values, tgt_sizes, image_bounds
    Tok-->>App: input_ids, pixel_values, tgt_sizes, image_bounds

    App->>Model: forward(input_ids, pixel_values, tgt_sizes, image_bounds)
    
    rect rgb(200, 220, 255)
    Note over Model: Vision Processing
    Model->>VPM: forward(pixel_values, tgt_sizes)
    VPM-->>Model: vision_embeddings
    
    Model->>Resampler: forward(vision_embeddings)
    Resampler-->>Model: resampled_embeddings
    end
    
    rect rgb(220, 200, 255)
    Note over Model: Text-Vision Fusion
    Model->>Model: merge_vision_text_embeddings(text_emb, vision_emb, image_bounds)
    Model-->>Model: fused_embeddings
    end
    
    rect rgb(200, 255, 220)
    Note over Model: Language Modeling
    Model->>LLM: forward(fused_embeddings, position_ids)
    LLM->>LLM: decoder stack with RoPE
    LLM-->>Model: output_logits
    end
    
    Model-->>App: sequence_output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

  • Primary concern: Dense vision processing logic in image_preprocessor_minicpmo.hpp with image slicing, cropping, and grid calculations; requires careful validation of image coordinate handling and patch alignment
  • Attention needed: Qwen2VL model structure with multimodal RoPE, position embeddings, and attention mask construction; complex tensor reshaping and per-batch processing
  • Scrutiny required: Vision-text embedding merging logic in MiniCPMOForCausalLM::merge_vision_text_embeddings; boundary conditions with variable-sized image/text inputs
  • Verify: Resampler attention implementation with dynamic positional embedding padding and key padding mask application
  • Test coverage: Integration between tokenizer multimodal conversion and model input tensor construction; special token handling and placeholder generation

Possibly related PRs

Suggested reviewers

  • yirongjie
  • oreomaker

Poem

🐰 Whiskers twitch with vision bright,
Images slice, patches take flight,
Tokens dance with resampler's grace,
Qwen2VL finds its rightful place.
Multimodal dreams, now running true—
MiniCPMO, we hop for you! 🌸

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.74% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat:Add support of MiniCPM-o-2_6 model" directly and accurately reflects the primary objective of the changeset. The modifications comprehensively implement MiniCPM-o-2_6 model support, including configuration structures, image preprocessing pipelines, core model components (vision, language, and resampler modules), tokenization logic, and example code. The title is specific, concise, and clearly communicates the main change without vague terminology or file listings. It accurately represents the developer's intent to introduce this new model capability to the framework.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码中没有cnpy和z的使用,需要移除这些额外依赖。并且避免使用$ENV{HOME}作为依赖目录

Copy link
Collaborator

@oreomaker oreomaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除额外依赖,整理demo代码

@chenghuaWang
Copy link
Collaborator

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Oct 30, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
mllm/backends/cpu/kernels/Kernels.hpp (1)

30-34: Remove duplicate include.

The header mllm/backends/cpu/kernels/arm/conv2d.hpp is included twice (lines 30 and 34), which is redundant.

Apply this diff to remove the duplicate:

 #include "mllm/backends/cpu/kernels/arm/gelu.hpp"                       // IWYU pragma: export
-#include "mllm/backends/cpu/kernels/arm/conv2d.hpp"                     // IWYU pragma: export
 #include "mllm/backends/cpu/kernels/arm/conv3d.hpp"                     // IWYU pragma: export
mllm/nn/Nn.hpp (1)

20-28: Remove duplicate include.

The header mllm/nn/layers/Conv2D.hpp is included on both line 20 and line 28.

Apply this diff to remove the duplicate:

 #include "mllm/nn/layers/KVCache.hpp"         // IWYU pragma: export
 #include "mllm/nn/layers/Conv1D.hpp"          // IWYU pragma: export
-#include "mllm/nn/layers/Conv2D.hpp"          // IWYU pragma: export
 #include "mllm/nn/layers/STFT.hpp"            // IWYU pragma: export

Based on static analysis hints.

♻️ Duplicate comments (1)
examples/minicpm_o/CMakeLists.txt (1)

2-3: Address past review feedback: Remove $ENV{HOME} dependency directories.

As previously noted by oreomaker, using $ENV{HOME} for dependency directories creates a non-portable build configuration that assumes a specific user environment setup. This should use proper CMake find_package mechanisms or project-relative paths.

Consider using CMake's standard dependency resolution:

-include_directories($ENV{HOME}/local/include)
-link_directories($ENV{HOME}/local/lib)
+# Use find_package or target-specific include directories instead
🧹 Nitpick comments (13)
mllm/models/minicpm_o2_6/configuration_minicpmo.hpp (2)

10-10: Consider public inheritance for ConfigFile.

The protected inheritance from ConfigFile restricts access to its public interface. If downstream code needs to access ConfigFile methods through MiniCPMOConfig, consider using public inheritance instead.


13-54: Consider using member initializer list for all fields.

For better performance and clarity, primitive-type fields (especially floating-point values like rms_norm_eps, audio_chunk_length, rope_theta, and tie_word_embeddings) should be initialized in the member initializer list rather than in the constructor body.

Example refactor:

-    explicit MiniCPMOConfig(const std::string& file_path) : ConfigFile(file_path) {
+    explicit MiniCPMOConfig(const std::string& file_path) : 
+        ConfigFile(file_path),
+        rms_norm_eps(data()["rms_norm_eps"]),
+        audio_chunk_length(data()["audio_chunk_length"]),
+        rope_theta(data()["rope_theta"]),
+        tie_word_embeddings(data()["tie_word_embeddings"]) {
         // Vision Config
         vision_hidden_size = data()["vision_hidden_size"];
         // ... other fields ...
-        rms_norm_eps = data()["rms_norm_eps"];
         // ... etc

Based on coding guidelines.

mllm/models/minicpm_o2_6/modeling_siglip.hpp (2)

391-395: Simplify redundant conditional branches.

The if-else branches on lines 391-394 both pass the same arguments to embeddings_ when tgt_sizes is nil. The explicit Tensor::nil() in line 392 is redundant since tgt_sizes is already nil.

Apply this diff to simplify:

-        std::vector<Tensor> hidden_states_result;
-        if (tgt_sizes.isNil()) {
-            hidden_states_result = embeddings_(pixel_values, Tensor::nil(), patch_attention_mask);
-        } else {
-            hidden_states_result = embeddings_(pixel_values, tgt_sizes, patch_attention_mask);
-        }
+        auto hidden_states_result = embeddings_(pixel_values, tgt_sizes, patch_attention_mask);

Based on static analysis hints.


406-414: Potential infinite loop with nested conditions.

The nested loop structure with all_valid flag (lines 407-414) has both loops conditioned on all_valid, which means the outer loop will break as soon as any invalid mask is found. However, the logic would be clearer with an explicit break or by extracting this to a helper function.

Consider refactoring for clarity:

// Helper function
bool all_patches_valid(const Tensor& patch_attention_mask, int batch_size, int max_patches) {
    for (int i = 0; i < batch_size; i++) {
        for (int j = 0; j < max_patches; j++) {
            if (patch_attention_mask.at<uint8_t>({i, j}) == 0) {
                return false;
            }
        }
    }
    return true;
}

// In forward():
if (!all_patches_valid(patch_attention_mask, batch_size, max_patches)) {
    // Create attention mask...
}
mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp (1)

202-249: Optimize string concatenation in loops.

Multiple functions build strings through repeated concatenation in loops (get_slice_image_placeholder, get_grid_placeholder). For better performance, consider reserving capacity upfront.

Example for get_slice_image_placeholder:

std::string get_slice_image_placeholder(std::pair<int, int> image_size, 
                                        const std::vector<int> &grid, 
                                        int image_idx = 0, bool use_image_id = true) {
    std::string image_placeholder;
    image_placeholder.reserve(8 + image_feature_size * 5 + 9);  // Approximate size
    image_placeholder = "<image>";
    for(int i=0; i < image_feature_size; i++){
        image_placeholder += "<unk>";
    }
    image_placeholder += "</image>";
    // ... rest of function
}
examples/minicpm_o/mainllm.cpp (1)

78-85: Remove or make the debug token limit configurable.

The hardcoded 50-token limit is intended for debugging but restricts the model's ability to generate complete responses. Consider removing it or adding a command-line argument to control the maximum output length.

Apply this diff to remove the debug limit:

       int token_count = 0;
       for(auto& step : minicpmo.chat(inputs)){
         auto token_str = minicpmo_tokenizer.detokenize(step.cur_token_id);
         std::wcout<< token_str << std::flush;
 
         token_count++;
-        if(token_count >= 50) break; // Limit output for debugging
       }

Or add a configurable limit via command-line argument.

examples/minicpm_o/main.cpp (2)

65-149: Remove commented-out code.

This large block of commented-out code (85 lines) adds clutter and should be removed. If this interactive CLI functionality is planned for future implementation, track it in an issue instead of leaving dead code in the repository.

Consider removing this entire commented section or opening an issue to track the interactive CLI feature implementation.


5-5: Remove unused commented include.

If modeling_minicpmo.hpp is not needed for this example, remove the commented include line to keep the code clean.

mllm/models/minicpm_o2_6/modeling_resampler.hpp (1)

99-203: Consider using framework utilities for tensor operations.

The forward method contains many nested loops for manual tensor operations (weight slicing, bias addition, reshaping). If the framework provides built-in utilities for these operations (e.g., tensor.slice(), tensor.reshape(), tensor.view()), using them could improve readability and potentially performance.

mllm/models/minicpm_o2_6/modeling_minicpmo.hpp (2)

22-74: Consider consolidating duplicate projection layer implementations.

AudioProjectionLayer and TTSProjector have identical implementations (linear → ReLU → linear). Since these components are currently commented out in the main model, consider creating a single reusable ProjectionLayer class when these features are activated.


295-297: Remove debug print statements.

These debug prints add noise to production logs.

Apply this diff:

     Tensor merge_vision_text_embeddings(Tensor& text_embeddings, Tensor& vision_embeddings, Tensor& image_bounds) {
-
-        mllm::print(text_embeddings.shape());
-        mllm::print(vision_embeddings.shape());
-        mllm::print(image_bounds);
         auto batch_size = text_embeddings.shape()[0]; //text_embeddings: [1, seq_len, embed_dim]
         ...
             }
         }
-
-        mllm::print("finished merging!");
-
          // Debug: Load and replace with Python-saved embeddings

Also applies to: 320-320

mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp (2)

418-425: Improve variable naming consistency.

The variable HW_patch_size uses mixed case which is inconsistent with typical C++ naming conventions. Consider renaming to max_hw_patch_size for clarity.

             int channels = img_tensors[0].shape()[0];
             int patch_size = img_tensors[0].shape()[1];
-            int HW_patch_size = img_tensors[0].shape()[2];
-            for(int i=0;i<img_tensors.size();i++){
-              if(img_tensors[i].shape()[2] > HW_patch_size){
-                HW_patch_size = img_tensors[i].shape()[2];
+            int max_hw_patch_size = img_tensors[0].shape()[2];
+            for(int i = 0; i < img_tensors.size(); i++){
+              if(img_tensors[i].shape()[2] > max_hw_patch_size){
+                max_hw_patch_size = img_tensors[i].shape()[2];
               }
             }
-            Tensor pixel_values = Tensor::empty({(int)img_tensors.size(),channels, patch_size, HW_patch_size}, kFloat32, kCPU)
+            Tensor pixel_values = Tensor::empty({(int)img_tensors.size(), channels, patch_size, max_hw_patch_size}, kFloat32, kCPU)

197-199: Clarify unused patch_size parameter.

The constructor accepts a patch_size parameter (default 14), but the image preprocessor initialization on line 198 is commented out. Either remove the unused parameter or document why it's reserved for future use.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2e6b28 and 811eee2.

📒 Files selected for processing (13)
  • examples/minicpm_o/CMakeLists.txt (1 hunks)
  • examples/minicpm_o/config_minicpm_o.json (1 hunks)
  • examples/minicpm_o/main.cpp (1 hunks)
  • examples/minicpm_o/mainllm.cpp (1 hunks)
  • mllm/backends/cpu/kernels/Kernels.hpp (1 hunks)
  • mllm/models/minicpm_o2_6/configuration_minicpmo.hpp (1 hunks)
  • mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp (1 hunks)
  • mllm/models/minicpm_o2_6/modeling_minicpmo.hpp (1 hunks)
  • mllm/models/minicpm_o2_6/modeling_qwen2vl_for_minicpmo.hpp (1 hunks)
  • mllm/models/minicpm_o2_6/modeling_resampler.hpp (1 hunks)
  • mllm/models/minicpm_o2_6/modeling_siglip.hpp (1 hunks)
  • mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp (1 hunks)
  • mllm/nn/Nn.hpp (1 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
mllm/nn/Nn.hpp

[error] 28-28: duplicate include

(readability-duplicate-include,-warnings-as-errors)

examples/minicpm_o/mainllm.cpp

[error] 1-1: 'iostream' file not found

(clang-diagnostic-error)


[error] 16-16: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 16-16: invalid case style for variable 'MLLM_MAIN'

(readability-identifier-naming,-warnings-as-errors)

examples/minicpm_o/main.cpp

[error] 1-1: 'iostream' file not found

(clang-diagnostic-error)


[error] 15-15: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 15-15: invalid case style for variable 'MLLM_MAIN'

(readability-identifier-naming,-warnings-as-errors)

mllm/models/minicpm_o2_6/modeling_resampler.hpp

[error] 6-6: 'mllm/mllm.hpp' file not found

(clang-diagnostic-error)


[error] 23-23: 2 adjacent parameters of 'get2DSinCosPosEmbed' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 24-24: variable 'grid_h_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 25-25: variable 'grid_w_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 27-27: variable 'pos_embed' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 29-29: variable 'half_dim' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 55-55: constructor does not initialize these fields: embed_dim_, num_heads_, head_dim_, in_proj_weight_, in_proj_bias_, out_proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 66-66: constructor does not initialize these fields: embed_dim_, num_heads_, head_dim_, in_proj_weight_, in_proj_bias_, out_proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 66-66: 3 adjacent parameters of 'ResamplerAttention' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 78-78: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 116-116: variable name 'q' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 117-117: variable name 'k' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 118-118: variable name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 209-209: constructor does not initialize these fields: num_queries_, embed_dim_, num_heads_, kv_dim_, max_size_, query_, kv_proj_, ln_q_, ln_kv_, ln_post_, proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 227-227: constructor does not initialize these fields: num_queries_, embed_dim_, num_heads_, kv_dim_, max_size_, query_, kv_proj_, ln_q_, ln_kv_, ln_post_, proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 227-227: 6 adjacent parameters of 'Resampler' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 246-246: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 247-247: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 253-253: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)


[error] 261-261: multiple declarations in a single statement reduces readability

(readability-isolate-declaration,-warnings-as-errors)


[error] 292-292: variable 'tgt_h' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 293-293: variable 'tgt_w' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 294-294: variable 'patch_count' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 296-296: variable 'pos_embed_i' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 312-312: variable 'pos_embed_padded' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 315-315: variable 'actual_len' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 334-334: variable name 'q' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp

[error] 5-5: 'cmath' file not found

(clang-diagnostic-error)


[error] 23-23: constructor does not initialize these fields: scale_resolution_, patch_size_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 27-27: 3 adjacent parameters of 'ImageSliceProcessor' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 27-27: 9 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 27-27: 448 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 27-27: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 190-190: constructor does not initialize these fields: mean_, std_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 190-190: 3 adjacent parameters of 'MiniCPMOImageProcessor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 190-190: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 190-190: 980 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 191-191: 0.5 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 191-191: 0.5 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 191-191: 2 adjacent parameters of 'MiniCPMOImageProcessor' of similar type ('float') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 191-191: 0.5 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 192-192: 0.5 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 192-192: 0.5 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 192-192: 0.5 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 197-197: 9 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 197-197: 448 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 201-201: 3 adjacent parameters of 'get_slice_image_placeholder' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 202-202: variable 'image_placeholder' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 208-208: variable 'final_placeholder' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 209-209: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)


[error] 223-223: variable 'slice_image_placeholder' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 229-229: variable 'cols' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 230-230: variable 'rows' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 234-234: variable 'line' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 241-241: variable 'result' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 376-376: 64 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/models/minicpm_o2_6/modeling_siglip.hpp

[error] 5-5: 'mllm/mllm.hpp' file not found

(clang-diagnostic-error)


[error] 16-16: constructor does not initialize these fields: embed_dim_, image_size_, patch_size_, num_patches_per_side_, num_patches_, patch_embedding_, position_embedding_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 29-29: constructor does not initialize these fields: embed_dim_, image_size_, patch_size_, num_patches_per_side_, num_patches_, patch_embedding_, position_embedding_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 29-29: 2 adjacent parameters of 'SiglipVisionEmbeddings' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 43-43: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 65-65: variable 'step' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 173-173: constructor does not initialize these fields: hidden_size_, intermediate_size_, fc1_, fc2_, activation_fn_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 184-184: constructor does not initialize these fields: hidden_size_, intermediate_size_, fc1_, fc2_, activation_fn_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 184-184: 2 adjacent parameters of 'SiglipMLP' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 193-193: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 196-196: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 205-205: constructor does not initialize these fields: embed_dim_, num_heads_, head_dim_, scale_, k_proj_, v_proj_, q_proj_, out_proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 219-219: constructor does not initialize these fields: embed_dim_, num_heads_, head_dim_, scale_, k_proj_, v_proj_, q_proj_, out_proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 219-219: 2 adjacent parameters of 'SiglipAttention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 231-231: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 273-273: constructor does not initialize these fields: embed_dim_, layer_norm1_, layer_norm2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 284-284: constructor does not initialize these fields: embed_dim_, layer_norm1_, layer_norm2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 284-284: 2 adjacent parameters of 'SiglipEncoderLayer' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 293-293: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 314-314: constructor does not initialize these fields: num_layers_, layers_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 321-321: constructor does not initialize these fields: num_layers_, layers_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 321-321: 2 adjacent parameters of 'SiglipVisionEncoder' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 329-329: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 343-343: constructor does not initialize these fields: embed_dim_, patch_size_, post_layernorm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 354-354: constructor does not initialize these fields: embed_dim_, patch_size_, post_layernorm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 354-354: 2 adjacent parameters of 'SiglipVisionModel' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 363-363: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 391-391: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)


[error] 401-401: variable 'attention_mask' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/models/minicpm_o2_6/configuration_minicpmo.hpp

[error] 5-5: 'mllm/core/aops/LinearOp.hpp' file not found

(clang-diagnostic-error)


[error] 30-30: 'rms_norm_eps' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 41-41: 'audio_chunk_length' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 50-50: 'rope_theta' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 51-51: 'tie_word_embeddings' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 72-72: 1e-06 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 93-93: 1000000.0 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/models/minicpm_o2_6/modeling_minicpmo.hpp

[error] 5-5: 'mllm/mllm.hpp' file not found

(clang-diagnostic-error)


[error] 23-23: constructor does not initialize these fields: linear1_, relu_, linear2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 28-28: constructor does not initialize these fields: linear1_, relu_, linear2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 28-28: 4 adjacent parameters of 'AudioProjectionLayer' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 35-35: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 36-36: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 50-50: constructor does not initialize these fields: linear1_, relu_, linear2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 55-55: constructor does not initialize these fields: linear1_, relu_, linear2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 55-55: 4 adjacent parameters of 'TTSProjector' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 62-62: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 63-63: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 79-79: constructor does not initialize these fields: config_, llm_, vpm_, resampler_, kv_cache_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 105-105: member variable 'config_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 106-106: member variable 'llm_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 107-107: member variable 'vpm_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 108-108: member variable 'resampler_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 113-113: member variable 'loadPythonEmbedding' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 118-118: redundant access specifier has the same accessibility as the previous access specifier

(readability-redundant-access-specifiers,-warnings-as-errors)


[error] 120-120: variable 'llm_config' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 137-137: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 140-140: variable 'input_ids' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 141-141: repeated branch in conditional chain

(bugprone-branch-clone,-warnings-as-errors)


[error] 150-150: variable 'pixel_values' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 151-151: variable 'tgt_sizes' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 152-152: variable 'image_bounds' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 153-153: variable 'audio_features' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 155-155: variable 'prev_position_ids' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 156-156: variable 'is_decode_stage' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 180-180: 10 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 181-181: 64 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 182-182: 3584 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 223-223: variable 'position_ids' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 237-237: loop variable name 'd' is too short, expected at least 2 characters

(readability-identifier-length,-warnings-as-errors)


[error] 260-260: variable 'result' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 293-293: 3 adjacent parameters of 'merge_vision_text_embeddings' of similar type ('int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 297-297: variable 'image_bounds' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 324-324: variable 'arr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 325-325: variable 'data_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 327-327: variable name 'tt' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 338-338: 3 adjacent parameters of 'merge_audio_text_embeddings' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/models/minicpm_o2_6/modeling_qwen2vl_for_minicpmo.hpp

[error] 4-4: 'array' file not found

(clang-diagnostic-error)


[error] 17-17: 2 adjacent parameters of 'makeVisualRoPEInvFreq' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 18-18: variable 'half_dim' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 19-19: variable 'inv_freq' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 20-20: variable 'inv_freq_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 21-21: variable 'dims_inv' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 23-23: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 23-23: narrowing conversion from 'int' to 'float'

(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions,-warnings-as-errors)


[error] 43-43: variable 'out' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 44-44: variable 'out_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 97-97: variable 'dim' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 98-98: variable 'freqs' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 99-99: variable 'inv_freq_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 100-100: variable 'freqs_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 103-103: variable 'row_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 115-115: variable 'sin_pos_emb' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 116-116: variable 'cos_pos_emb' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 131-131: 3 adjacent parameters of 'makeVisualRotaryPosEmb' of similar type ('int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 132-132: variable 'grid_dims' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 133-133: variable name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 133-133: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)


[error] 134-134: variable name 'h' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 134-134: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)


[error] 135-135: variable name 'w' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 135-135: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)


[error] 137-137: variable 'num_positions' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 138-138: variable 'dim' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 139-139: variable 'batch_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 140-140: variable 'seq_len' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 143-143: variable 'out' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 174-174: 2 adjacent parameters of 'makeMultimodalRoPEInvFreq' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 181-181: 4 adjacent parameters of 'makeMultimodalPositionEmbedding' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 189-189: variable 'tmp_sin' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 190-190: variable 'tmp_cos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 192-192: loop variable name 'b' is too short, expected at least 2 characters

(readability-identifier-length,-warnings-as-errors)


[error] 204-204: variable 'sin' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 205-205: variable 'cos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 212-212: variable 'num_rows' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 213-213: variable 'num_cols' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 246-246: variable 'num_rows' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 247-247: variable 'num_cols' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 271-271: constructor does not initialize these fields: in_chans_, embed_dim_, patch_size_, temporal_patch_size_, proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 282-282: constructor does not initialize these fields: in_chans_, embed_dim_, patch_size_, temporal_patch_size_, proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 282-282: 2 adjacent parameters of 'PatchEmbed' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 294-294: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 305-305: constructor does not initialize these fields: hidden_size_, spatial_merge_size_, context_dim_, ln_q_, mlp_0_, mlp_2_, mlp_gelu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 318-318: constructor does not initialize these fields: hidden_size_, spatial_merge_size_, context_dim_, ln_q_, mlp_0_, mlp_2_, mlp_gelu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 318-318: 2 adjacent parameters of 'PatchMerger' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 329-329: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 330-330: variable name 'o' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 338-338: constructor does not initialize these fields: dim_, hidden_dim_, act_, fc_1_, fc_2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 349-349: constructor does not initialize these fields: dim_, hidden_dim_, act_, fc_1_, fc_2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 349-349: 2 adjacent parameters of 'VisionMlp' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 358-358: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 363-363: constructor does not initialize these fields: dim_, num_heads_, head_dim_, qkv_, proj_, softmax_, vision_rope_q_, vision_rope_k_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 368-368: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 379-379: constructor does not initialize these fields: dim_, num_heads_, head_dim_, qkv_, proj_, softmax_, vision_rope_q_, vision_rope_k_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 379-379: 2 adjacent parameters of 'VisionAttention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 403-403: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 442-442: constructor does not initialize these fields: mlp_hidden_dim_, norm1_, norm2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 454-454: constructor does not initialize these fields: mlp_hidden_dim_, norm1_, norm2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 454-454: 2 adjacent parameters of 'Qwen2VLVisionBlock' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 462-462: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 473-473: constructor does not initialize these fields: blocks_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 481-481: constructor does not initialize these fields: blocks_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 481-481: 2 adjacent parameters of 'Qwen2VisionTransformerPretrainedModel' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 487-487: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 502-502: constructor does not initialize these fields: gate_proj_, up_proj_, down_proj_, silu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 510-510: constructor does not initialize these fields: gate_proj_, up_proj_, down_proj_, silu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 510-510: 2 adjacent parameters of 'Qwen2VLMLP' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 517-517: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 518-518: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 520-520: variable name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 527-527: constructor does not initialize these fields: q_proj_, k_proj_, v_proj_, o_proj_, q_rope_, k_rope_, mask_, softmax_, hidden_size_, head_dim_, num_attention_heads_, num_key_value_heads_, num_key_value_groups_, layer_idx_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 546-546: constructor does not initialize these fields: q_proj_, k_proj_, v_proj_, o_proj_, q_rope_, k_rope_, mask_, softmax_, hidden_size_, num_attention_heads_, num_key_value_heads_, layer_idx_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 546-546: 2 adjacent parameters of 'Qwen2VLAttention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 551-551: 'num_key_value_groups_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 571-571: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 572-572: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 582-582: variable 'B' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 582-582: variable name 'B' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 582-582: invalid case style for variable 'B'

(readability-identifier-naming,-warnings-as-errors)


[error] 583-583: variable 'S' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 583-583: variable name 'S' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 583-583: invalid case style for variable 'S'

(readability-identifier-naming,-warnings-as-errors)


[error] 604-604: variable 'attn' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 605-605: repeated branch in conditional chain

(bugprone-branch-clone,-warnings-as-errors)


[error] 627-627: member variable 'layer_idx_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 630-630: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 639-639: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 639-639: 2 adjacent parameters of 'Qwen2VLDecoder' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 646-646: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 651-651: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 661-661: constructor does not initialize these fields: decode_blocks_, norm_, lm_head_, tie_word_embeddings_, embedding_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 670-670: constructor does not initialize these fields: decode_blocks_, norm_, lm_head_, tie_word_embeddings_, embedding_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 670-670: 2 adjacent parameters of 'Qwen2VLText' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 689-689: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 693-693: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 703-703: variable name 'S' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 703-703: invalid case style for variable 'S'

(readability-identifier-naming,-warnings-as-errors)


[error] 712-712: member variable 'embedding_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 717-717: constructor does not initialize these fields: kv_cache_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 731-731: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 759-759: variable 'vision_pad_token_start' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 762-762: variable name 'S' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 762-762: invalid case style for variable 'S'

(readability-identifier-naming,-warnings-as-errors)


[error] 773-773: variable name 'D' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 773-773: invalid case style for variable 'D'

(readability-identifier-naming,-warnings-as-errors)


[error] 804-804: 4 adjacent parameters of 'getPositionIds' of similar type ('int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 819-819: 2 adjacent parameters of 'getPositionIdsPrefill' of similar type ('int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 825-825: variable name 'B' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 825-825: invalid case style for variable 'B'

(readability-identifier-naming,-warnings-as-errors)


[error] 827-827: variable name 'S' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 827-827: invalid case style for variable 'S'

(readability-identifier-naming,-warnings-as-errors)


[error] 829-829: variable 'position_ids' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 834-834: variable 'vision_pad_token_start' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 847-847: multiple declarations in a single statement reduces readability

(readability-isolate-declaration,-warnings-as-errors)


[error] 847-847: variable 'img_t' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 847-847: variable 'img_h' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 847-847: variable 'img_w' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 848-848: multiple declarations in a single statement reduces readability

(readability-isolate-declaration,-warnings-as-errors)


[error] 848-848: variable 'inputs_t' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 848-848: variable 'inputs_h' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 848-848: variable 'inputs_w' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 861-861: variable 'current_max_position_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 864-864: variable 'start_idx' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 865-865: loop variable name 'd' is too short, expected at least 2 characters

(readability-identifier-length,-warnings-as-errors)


[error] 873-873: invalid case style for variable '_cnt'

(readability-identifier-naming,-warnings-as-errors)


[error] 874-874: variable 'vision_start_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 898-898: variable 'vision_token_count' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 899-899: variable 'trailing_text_start_seq' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 900-900: variable 'trailing_text_count' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 903-903: variable 'start_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 904-904: loop variable name 'd' is too short, expected at least 2 characters

(readability-identifier-length,-warnings-as-errors)


[error] 917-917: member variable 'cfg' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 918-918: member variable 'llm' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 919-919: member variable 'visual' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp

[error] 5-5: 'vector' file not found

(clang-diagnostic-error)


[error] 21-21: 2 adjacent parameters of 'miniCPMOTokenizerMatchPattern' of similar type ('int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 36-36: variable 'original_pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 54-54: do not use 'else' after 'return'

(readability-else-after-return,-warnings-as-errors)


[error] 72-72: variable 'original_pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 74-74: variable 'start' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 95-95: do not use 'else' after 'return'

(readability-else-after-return,-warnings-as-errors)


[error] 103-103: variable 'start' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 109-109: do not use 'else' after 'return'

(readability-else-after-return,-warnings-as-errors)


[error] 116-116: variable 'start' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 122-122: do not use 'else' after 'return'

(readability-else-after-return,-warnings-as-errors)


[error] 129-129: variable 'start' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 140-140: variable 'pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 142-142: variable 'matched' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 143-143: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)


[error] 152-152: constructor does not initialize these fields: prompt, img_file_path, audio_file_path

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 160-160: variable 'result' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 187-187: constructor does not initialize these fields: prompt

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 197-197: constructor does not initialize these fields: image_preprocessor_, bpe_, bytes_2_unicode_dict_, bytes_2_unicode_dict_inverse_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 197-197: 2 adjacent parameters of 'MiniCPMOTokenizer' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 197-197: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 201-201: variable 'bytes_2_unicode_dict_' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 201-201: invalid case style for variable 'bytes_2_unicode_dict_'

(readability-identifier-naming,-warnings-as-errors)


[error] 282-282: variable 'utf_8_str' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 291-291: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 313-313: variable 'end' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 320-320: variable 'pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 329-329: variable 'final_text' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 368-368: variable 'sequence' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 376-376: variable 'result' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 392-392: 2 adjacent parameters of 'Convert2Tensors' of similar type ('int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 394-394: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 397-397: variable 'result' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 401-401: variable 'input_ids_tensor' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 418-418: variable 'channels' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 419-419: variable 'patch_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 420-420: variable 'HW_patch_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 420-420: invalid case style for variable 'HW_patch_size'

(readability-identifier-naming,-warnings-as-errors)


[error] 426-426: variable 'pixel_values' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 456-456: variable 'tgt_sizes_tensor' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 470-470: variable 'image_bounds_tensor' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 485-485: redundant access specifier has the same accessibility as the previous access specifier

(readability-redundant-access-specifiers,-warnings-as-errors)

🔇 Additional comments (7)
examples/minicpm_o/config_minicpm_o.json (1)

1-37: LGTM!

The configuration file is comprehensive and well-structured, covering all necessary hyperparameters for the MiniCPMO multimodal model across vision, LLM, resampler, audio, TTS, and common runtime settings.

mllm/models/minicpm_o2_6/modeling_resampler.hpp (2)

23-53: LGTM!

The 2D sinusoidal positional embedding implementation correctly handles both square (single dimension) and rectangular (two dimensions) grid sizes, with proper sin/cos encoding for width and height dimensions.


278-286: LGTM!

The key padding mask initialization is correct: it initializes all positions to 1 (masked/padding), then sets valid positions to 0 (not masked). This aligns with the usage in ResamplerAttention where mask value of 1 results in -infinity attention weights.

mllm/models/minicpm_o2_6/modeling_qwen2vl_for_minicpmo.hpp (4)

149-159: Good defensive programming with bounds checking.

The explicit validation of tensor dimensions and position indices helps prevent out-of-bounds access. This is especially important for vision processing where dimension mismatches could occur.


243-265: Verify simplified position embedding fallback.

When mrope_section is empty, the code uses only the first dimension (dimension 0) of the 3D tmp_sin/tmp_cos tensors. The comment acknowledges this is "simplified." Ensure this fallback behavior is correct for all expected use cases where mrope_section is empty.


759-776: LGTM! Vision embedding insertion is well-implemented.

The code properly:

  1. Searches for the vision pad token position
  2. Asserts that the token was found (preventing silent failures)
  3. Safely copies visual embeddings into the input embeddings at the correct position

831-844: Single-image limitation is documented.

The comment on line 833 clearly states "Only one image is supported." This is an acceptable limitation for an initial implementation, and the assertion on line 843 will catch violations.

link_directories($ENV{HOME}/local/lib)

add_executable(main_minicpm_o main.cpp)
target_link_libraries(main_minicpm_o PRIVATE MllmRT MllmCPUBackend cnpy z)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify necessity of cnpy and z library dependencies.

The past review noted that cnpy and z are not used in the code. While cnpy.h is included in mainllm.cpp (line 12), it's unclear if these libraries are actually required. If they're not used, they should be removed to avoid unnecessary dependencies.

Run the following script to check for actual usage of cnpy library functions:

If no results are found, remove the cnpy and z dependencies:

-target_link_libraries(main_minicpm_o PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_o PRIVATE MllmRT MllmCPUBackend)

-target_link_libraries(main_minicpm_o2 PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_o2 PRIVATE MllmRT MllmCPUBackend)

-target_link_libraries(main_minicpm_dbg PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_dbg PRIVATE MllmRT MllmCPUBackend)

And remove the unused include from mainllm.cpp.

Also applies to: 10-10, 14-14


🏁 Script executed:

#!/bin/bash
# Search for cnpy namespace or function calls in the example files
rg -n --type=cpp 'cnpy::|npy_save|npy_load|npz_save|npz_load' examples/minicpm_o/

Length of output: 0


I need to verify this more thoroughly. Let me check for the cnpy.h include and examine the source files more carefully.


---

🏁 Script executed:

```shell
#!/bin/bash
# Check for cnpy.h include statements
echo "=== Searching for cnpy.h includes ==="
rg -n 'cnpy\.h|#include.*cnpy' examples/minicpm_o/ --type=cpp --type=h

echo ""
echo "=== Searching for 'cnpy' anywhere in source files ==="
rg -n 'cnpy' examples/minicpm_o/ --type=cpp --type=h

echo ""
echo "=== Searching for 'npy' usage ==="
rg -n '\bnpy[A-Za-z_]*' examples/minicpm_o/ --type=cpp --type=h

echo ""
echo "=== Listing all .cpp and .h files in examples/minicpm_o ==="
fd -e cpp -e h examples/minicpm_o/

Length of output: 674


Good, cnpy is only included but never used. Let me verify the context and check for z library usage:


---

🏁 Script executed:

```shell
#!/bin/bash
# Show context around cnpy.h include in mainllm.cpp
echo "=== Context around cnpy.h include ==="
rg -n -B 5 -A 10 'cnpy\.h' examples/minicpm_o/mainllm.cpp

echo ""
echo "=== Searching for z library usage (zlib functions) ==="
rg -n '\b(compress|uncompress|deflate|inflate|gzip|ungzip|gz_)' examples/minicpm_o/ --type=cpp --type=h

echo ""
echo "=== List all files in examples/minicpm_o ==="
fd . examples/minicpm_o/

Length of output: 962


Let me do one final check on main.cpp to be absolutely certain:


---

🏁 Script executed:

```shell
#!/bin/bash
# Check main.cpp for any cnpy or z usage
echo "=== Checking main.cpp for cnpy or z usage ==="
rg -n 'cnpy|compress|uncompress|deflate|inflate|gzip' examples/minicpm_o/main.cpp
echo "Status: $?"

Length of output: 55


Remove unused cnpy and z library dependencies.

Verification confirms that cnpy and z are completely unused in the codebase. While cnpy.h is included in mainllm.cpp (line 12), no cnpy functions are called anywhere, and no zlib compression functions are used. These should be removed:

-target_link_libraries(main_minicpm_o PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_o PRIVATE MllmRT MllmCPUBackend)

-target_link_libraries(main_minicpm_o2 PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_o2 PRIVATE MllmRT MllmCPUBackend)

-target_link_libraries(main_minicpm_dbg PRIVATE MllmRT MllmCPUBackend cnpy z)
+target_link_libraries(main_minicpm_dbg PRIVATE MllmRT MllmCPUBackend)

Also remove the unused include from mainllm.cpp line 12.

Also applies to: 10-10, 14-14

🤖 Prompt for AI Agents
In examples/minicpm_o/CMakeLists.txt around line 6, remove the unused cnpy and z
libraries from the target_link_libraries call so only required libraries (MllmRT
MllmCPUBackend) remain; also edit mainllm.cpp (remove the unused #include
"cnpy.h" at line 12) and apply the same removals for the other affected CMake
lines referenced (lines 10 and 14) to eliminate the unused cnpy and z
dependencies and their unused include.

Comment on lines 47 to 50
mllm::models::minicpmo::MiniCPMOMessage message{
.prompt = "Inctroduce your self"
};
message.img_file_path = "/Users/kkkai/Desktop/pics.jpg";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix typo and remove hardcoded test values.

Two issues here:

  1. Typo: "Inctroduce" should be "Introduce"
  2. Hardcoded path: The image path /Users/kkkai/Desktop/pics.jpg is a developer-specific test value that will cause runtime failures for other users

Apply this diff:

        mllm::models::minicpmo::MiniCPMOMessage message{
-         .prompt = "Inctroduce your self"
+         .prompt = "Introduce yourself"
        };
-       message.img_file_path = "/Users/kkkai/Desktop/pics.jpg";
+       // TODO: Add command-line argument for image path or remove this test code
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mllm::models::minicpmo::MiniCPMOMessage message{
.prompt = "Inctroduce your self"
};
message.img_file_path = "/Users/kkkai/Desktop/pics.jpg";
mllm::models::minicpmo::MiniCPMOMessage message{
.prompt = "Introduce yourself"
};
// TODO: Add command-line argument for image path or remove this test code
🤖 Prompt for AI Agents
In examples/minicpm_o/main.cpp around lines 47 to 50, fix the misspelled prompt
and remove the developer-specific hardcoded image path: change "Inctroduce your
self" to "Introduce yourself" and replace message.img_file_path =
"/Users/kkkai/Desktop/pics.jpg" with a configurable value (e.g., read from a
command-line argument, environment variable, or make it optional/empty) so the
example does not rely on a local developer filesystem path.

Comment on lines 61 to 66
std::string image_path = "/Users/kkkai/Desktop/pics.jpg";
std::string prompt_text = "描述图片中物体";
mllm::models::minicpmo::MiniCPMOMessage message;
message.prompt = prompt_text;
message.img_file_path = image_path;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove hardcoded test values before merge.

The hardcoded image path (/Users/kkkai/Desktop/pics.jpg) and prompt text are developer-specific test values that should not be committed to the repository. This will cause runtime failures for other users.

Replace with command-line arguments or restore the commented interactive input code below:

-      std::string image_path = "/Users/kkkai/Desktop/pics.jpg";
-      std::string prompt_text = "描述图片中物体";
-      mllm::models::minicpmo::MiniCPMOMessage message;
-      message.prompt = prompt_text;
-      message.img_file_path = image_path;
-
-      // fmt::print("📷 Image path (or 'exit/quit'): ");
-      // std::getline(std::cin, image_path);
-      // if (image_path == "exit" || image_path == "quit") { return 0; }
-      // fmt::print("💬 Prompt text: ");
-      // std::getline(std::cin, prompt_text);
+      std::string image_path, prompt_text;
+      fmt::print("📷 Image path (or 'exit/quit'): ");
+      std::getline(std::cin, image_path);
+      if (image_path == "exit" || image_path == "quit") { return 0; }
+      fmt::print("💬 Prompt text: ");
+      std::getline(std::cin, prompt_text);
+      
+      mllm::models::minicpmo::MiniCPMOMessage message;
+      message.prompt = prompt_text;
+      message.img_file_path = image_path;
🤖 Prompt for AI Agents
In examples/minicpm_o/mainllm.cpp around lines 61 to 66, remove the hardcoded
image path and prompt text ("/Users/kkkai/Desktop/pics.jpg" and "描述图片中物体") and
replace them with runtime inputs: read values from command-line arguments (e.g.,
argc/argv) or re-enable the previously commented interactive input code so the
program fails gracefully for other users; ensure you validate the inputs
(existence/readable file) and provide a clear usage message when arguments are
missing.

Comment on lines +146 to +180
Image crop_image(Image& image, int x, int y, int crop_width, int crop_height) {
// Get source image properties
int src_width = image.w();
int src_height = image.h();
int src_channels = image.c();

// Ensure crop bounds are valid
x = std::max(0, std::min(x, src_width - 1));
y = std::max(0, std::min(y, src_height - 1));
crop_width = std::min(crop_width, src_width - x);
crop_height = std::min(crop_height, src_height - y);

unsigned char* src_data = image.ptr();
unsigned char* crop_data = new unsigned char[crop_width * crop_height * src_channels];

// Copy pixel data row by row
for (int row = 0; row < crop_height; ++row) {
int src_row_offset = ((y + row) * src_width + x) * src_channels;
int crop_row_offset = row * crop_width * src_channels;
std::memcpy(crop_data + crop_row_offset,
src_data + src_row_offset,
crop_width * src_channels);
}

// Create a temporary file to save cropped data and load as Image
std::string temp_path = "/tmp/crop_" + std::to_string(rand()) + ".png";
stbi_write_png(temp_path.c_str(), crop_width, crop_height, src_channels, crop_data, crop_width * src_channels);

Image cropped_image = Image::open(temp_path);

delete[] crop_data;
std::remove(temp_path.c_str());

return cropped_image;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Insecure and non-portable temporary file handling.

The crop_image method has several serious issues:

  1. Non-portable path: /tmp/ is Unix-specific and doesn't work on Windows.
  2. Predictable filenames: Using rand() without seeding creates predictable, collision-prone filenames.
  3. Resource leak risk: If Image::open() throws, the temporary file is never deleted.
  4. Security risk: Predictable temp filenames in shared directories can be exploited.

Consider using a proper temporary file mechanism:

Image crop_image(Image& image, int x, int y, int crop_width, int crop_height) {
    // ... (existing bounds checking and crop logic) ...
    
    // Use std::filesystem for portable temp directory
    auto temp_dir = std::filesystem::temp_directory_path();
    auto temp_file = temp_dir / ("crop_" + std::to_string(std::random_device{}()) + ".png");
    
    stbi_write_png(temp_file.string().c_str(), crop_width, crop_height, 
                   src_channels, crop_data, crop_width * src_channels);
    
    // RAII for cleanup
    struct TempFileGuard {
        std::filesystem::path path;
        ~TempFileGuard() { std::filesystem::remove(path); }
    } guard{temp_file};
    
    Image cropped_image = Image::open(temp_file.string());
    delete[] crop_data;
    
    return cropped_image;
}

Additionally, consider whether writing to disk is necessary at all—could you construct the Image directly from the cropped buffer?

🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp around lines
146-180, the function writes a predictable /tmp/ file using rand() and can leak
the temp file on exceptions; replace the hardcoded path and rand() with
std::filesystem::temp_directory_path() and a securely generated unique name
(e.g. std::random_device + std::chrono or std::filesystem::unique_path), write
the PNG to that path, and ensure deletion even if Image::open throws by using an
RAII guard (a small struct with path and destructor that removes the file) or
std::filesystem::remove in a try/finally equivalent; additionally, if Image can
be constructed directly from the in-memory buffer, avoid writing to disk
entirely and construct the Image from crop_data to remove temp file handling.

Comment on lines +283 to +315
std::tuple<std::vector<Tensor>, std::vector<std::pair<int, int>>, std::vector<std::pair<int, int>>, std::vector<int>> process(const std::string& image_path, int max_slice_nums = 9) {
auto img = Image::open(image_path);
std::pair<int, int> original_size = {img.w(), img.h()};
auto [source_image, patches, grid] = image_slice_processor_.slice_image(img);
std::vector<Image> slice_images;
slice_images.push_back(source_image);

if (!patches.empty()) {
for (const auto& patch_row : patches) {
for (const auto& patch : patch_row) {
slice_images.push_back(patch);
}
}
}
std::vector<Tensor> processed_tensors;
std::vector<std::pair<int, int>> tgt_sizes;

for (auto& slice_img : slice_images) {

auto tensor = slice_img.tensor(); // [H, W, C]
tensor = tensor.permute({2, 0, 1}); // [C, H, W]
tensor = tensor * (1.f / 255.f);
normalize_tensor(tensor);
auto reshaped_tensor = reshape_by_patch(tensor);
processed_tensors.push_back(reshaped_tensor);

// Calculate target size (patch dimensions)
int patch_h = tensor.shape()[1] / patch_size_;
int patch_w = tensor.shape()[2] / patch_size_;
tgt_sizes.push_back({patch_h, patch_w});
}
return std::make_tuple(processed_tensors, std::vector<std::pair<int, int>>{original_size}, tgt_sizes, grid);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling for image loading.

The process method calls Image::open(image_path) without error handling. If the image file doesn't exist or is corrupted, this could cause a crash or undefined behavior.

Add validation:

 std::tuple<std::vector<Tensor>, std::vector<std::pair<int, int>>, std::vector<std::pair<int, int>>, std::vector<int>> process(const std::string& image_path, int max_slice_nums = 9) {
     auto img = Image::open(image_path);
+    if (img.empty() || img.w() == 0 || img.h() == 0) {
+        throw std::runtime_error("Failed to load image from: " + image_path);
+    }
     std::pair<int, int> original_size = {img.w(), img.h()};

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp around lines
283-315, Image::open(image_path) is used without validation; wrap the call in a
try/catch (or check for a null/invalid Image if the API provides such a check)
and handle failures by logging a clear error with the path and either throwing a
std::runtime_error or returning an empty tuple consistent with callers; ensure
you do not dereference img when invalid and that the function returns the
correct types (empty vectors) or propagates the exception so callers can handle
the missing/corrupt image.

Comment on lines +340 to +357
Tensor x_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc();
for(int i = 0; i < seq_len; i++){
for(int j = 0; j < embed_dim_; j++){
x_b.at<float>({i, j}) = x.at<float>({b, i, j});
}
}

// pos_embed for this batch
Tensor pos_embed_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc();
for(int i = 0; i < seq_len; i++){
for(int j = 0; j < embed_dim_; j++){
if(i < max_patch_len){
pos_embed_b.at<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
} else {
pos_embed_b.at<float>({i, j}) = 0.0f;
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify tensor write operations use correct API.

The code uses x_b.at<float>({i, j}) = ... and pos_embed_b.at<float>({i, j}) = ... for writing tensor values (lines 343, 352, 354), while most other write operations throughout the file use *tensor.offsettedPtr<float>({...}) = .... Ensure that at<float>() returns a mutable reference suitable for writes, or use the consistent offsettedPtr pattern.

Verify the Tensor API usage:

If at<> is read-only, apply this diff:

             for(int i = 0; i < seq_len; i++){
                 for(int j = 0; j < embed_dim_; j++){
-                    x_b.at<float>({i, j}) = x.at<float>({b, i, j});
+                    *x_b.offsettedPtr<float>({i, j}) = x.at<float>({b, i, j});
                 }
             }

             // pos_embed for this batch
             Tensor pos_embed_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc();
             for(int i = 0; i < seq_len; i++){
                 for(int j = 0; j < embed_dim_; j++){
                     if(i < max_patch_len){
-                        pos_embed_b.at<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
+                        *pos_embed_b.offsettedPtr<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
                     } else {
-                        pos_embed_b.at<float>({i, j}) = 0.0f;
+                        *pos_embed_b.offsettedPtr<float>({i, j}) = 0.0f;
                     }
                 }
             }

🏁 Script executed:

#!/bin/bash
# Check the Tensor class definition for at<> method signature
rg -n "at<.*>\(" --type=hpp -A 2 | head -20

Length of output: 27


🏁 Script executed:

#!/bin/bash
# Search for Tensor class definition with at<> method
rg -n "class Tensor" -A 30 | head -100

Length of output: 5499


🏁 Script executed:

#!/bin/bash
# Search for at<> method definition
rg -n "\.at<" --type=cpp -B 2 -A 2

Length of output: 32437


🏁 Script executed:

#!/bin/bash
# Search for offsettedPtr method definition
rg -n "offsettedPtr" --type=cpp -B 2 -A 2 | head -50

Length of output: 4579


🏁 Script executed:

#!/bin/bash
# Look at the specific file to understand tensor API usage
rg -n "offsettedPtr.*float" mllm/models/minicpm_o2_6/modeling_resampler.hpp -B 1 -A 1

Length of output: 3442


Replace tensor write operations with offsettedPtr pattern for consistency.

The code at lines 343, 352, and 354 uses at<float>() for tensor writes, which is inconsistent with the established pattern throughout the file and codebase. All other write operations in this file use *tensor.offsettedPtr<float>({...}) = value, while at<>() is reserved for reads. Apply the suggested diff to align with the correct API usage:

             for(int i = 0; i < seq_len; i++){
                 for(int j = 0; j < embed_dim_; j++){
-                    x_b.at<float>({i, j}) = x.at<float>({b, i, j});
+                    *x_b.offsettedPtr<float>({i, j}) = x.at<float>({b, i, j});
                 }
             }

             // pos_embed for this batch
             Tensor pos_embed_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc();
             for(int i = 0; i < seq_len; i++){
                 for(int j = 0; j < embed_dim_; j++){
                     if(i < max_patch_len){
-                        pos_embed_b.at<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
+                        *pos_embed_b.offsettedPtr<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
                     } else {
-                        pos_embed_b.at<float>({i, j}) = 0.0f;
+                        *pos_embed_b.offsettedPtr<float>({i, j}) = 0.0f;
                     }
                 }
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Tensor x_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc();
for(int i = 0; i < seq_len; i++){
for(int j = 0; j < embed_dim_; j++){
x_b.at<float>({i, j}) = x.at<float>({b, i, j});
}
}
// pos_embed for this batch
Tensor pos_embed_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc();
for(int i = 0; i < seq_len; i++){
for(int j = 0; j < embed_dim_; j++){
if(i < max_patch_len){
pos_embed_b.at<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
} else {
pos_embed_b.at<float>({i, j}) = 0.0f;
}
}
}
Tensor x_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc();
for(int i = 0; i < seq_len; i++){
for(int j = 0; j < embed_dim_; j++){
*x_b.offsettedPtr<float>({i, j}) = x.at<float>({b, i, j});
}
}
// pos_embed for this batch
Tensor pos_embed_b = Tensor::empty({seq_len, embed_dim_}, kFloat32).alloc();
for(int i = 0; i < seq_len; i++){
for(int j = 0; j < embed_dim_; j++){
if(i < max_patch_len){
*pos_embed_b.offsettedPtr<float>({i, j}) = pos_embed_padded.at<float>({b, i, j});
} else {
*pos_embed_b.offsettedPtr<float>({i, j}) = 0.0f;
}
}
}

auto hidden_states = inputs_embeds;
for (auto& layer : layers_) {
hidden_states = layer(hidden_states, attention_mask)[0];
//break; // For testing, run only one layer
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove commented-out debug code.

The commented //break; statement on line 336 appears to be leftover debug code and should be removed.

         for (auto& layer : layers_) {
             hidden_states = layer(hidden_states, attention_mask)[0];   
-            //break; // For testing, run only one layer
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//break; // For testing, run only one layer
for (auto& layer : layers_) {
hidden_states = layer(hidden_states, attention_mask)[0];
}
🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/modeling_siglip.hpp around line 336, remove the
leftover commented-out debug code "//break;" so the file contains no stray
commented debugging statements; simply delete that comment line and run a quick
build/format to ensure no trailing whitespace or style issues remain.

Comment on lines +370 to +375
for(int i=0;i<tgt_sizes.shape()[0];i++){
if(tgt_sizes.at<int32_t>({i,0})>0 && tgt_sizes.at<int32_t>({i,1})>0){
int patches = (tgt_sizes.at<int32_t>({i,0}) ) * (tgt_sizes.at<int32_t>({i,1}) );
if (patches > max_patches) max_patches = patches;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add bounds checking for tgt_sizes access.

The loop accesses tgt_sizes.at<int32_t>({i, 0}) and tgt_sizes.at<int32_t>({i, 1}) without verifying that tgt_sizes has the expected shape. If tgt_sizes.shape()[1] is less than 2, this will cause out-of-bounds access.

Add shape validation:

         int max_patches = 0;
         // Calculate max_patches based on tgt_sizes
+        if (!tgt_sizes.isNil() && tgt_sizes.shape().size() >= 2 && tgt_sizes.shape()[1] >= 2) {
         for(int i=0;i<tgt_sizes.shape()[0];i++){
           if(tgt_sizes.at<int32_t>({i,0})>0 && tgt_sizes.at<int32_t>({i,1})>0){
             int patches = (tgt_sizes.at<int32_t>({i,0}) ) * (tgt_sizes.at<int32_t>({i,1}) );
             if (patches > max_patches) max_patches = patches;
           }
         }
+        }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/modeling_siglip.hpp around lines 370-375, the loop
accesses tgt_sizes.at<int32_t>({i,0}) and {i,1} without checking that tgt_sizes
has at least two columns; add a guard to validate tgt_sizes.shape().size() > 1
and tgt_sizes.shape()[1] >= 2 before the loop (or check per-iteration) and skip
or handle rows that do not have two elements (e.g., continue, set patches=0, or
emit a clear error), ensuring you still bound-check i against
tgt_sizes.shape()[0] to avoid out-of-bounds access.

std::string prompt;
std::string img_file_path;
std::string audio_file_path;
std::string system_prompt = "You are Qwen, created by Alibaba Clound. You are a helpful assistant.";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix system prompt branding and typo.

The default system prompt incorrectly identifies the model as "Qwen, created by Alibaba Clound" (note the typo "Clound"). This should reflect the MiniCPMO model identity instead.

Consider updating to:

-    std::string system_prompt = "You are Qwen, created by Alibaba Clound. You are a helpful assistant.";
+    std::string system_prompt = "You are MiniCPM-o, a helpful multimodal assistant.";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::string system_prompt = "You are Qwen, created by Alibaba Clound. You are a helpful assistant.";
std::string system_prompt = "You are MiniCPM-o, a helpful multimodal assistant.";
🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp around line 156, the
system_prompt incorrectly brands the model as "Qwen, created by Alibaba Clound"
and contains a typo; update the prompt to identify the model as MiniCPMO and
remove the incorrect Alibaba/Qwen branding and typo — for example, set the
prompt text to a clear MiniCPMO identity like: You are MiniCPMO, a helpful
assistant. Ensure spelling ("Cloud") is not present and the prompt accurately
reflects the MiniCPMO model.

Comment on lines +439 to +444
if(hw > img_tensors[b].shape()[2]){
pixel_values_ptr[dst_idx]=0;
}
else{
pixel_values_ptr[dst_idx]=img_tensors[b].at<float>({c, p, hw});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix off-by-one error in boundary check.

Line 439 uses > but should use >=. When hw equals img_tensors[b].shape()[2], the current condition is false, causing line 443 to access an out-of-bounds index.

Apply this diff:

-                    if(hw > img_tensors[b].shape()[2]){
+                    if(hw >= img_tensors[b].shape()[2]){
                       pixel_values_ptr[dst_idx]=0;
                     }
🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp around lines 439 to 444,
the boundary check for hw is off-by-one: change the condition from `>` to `>=`
so that when hw equals img_tensors[b].shape()[2] it treats it as out-of-bounds
and writes 0 instead of accessing img_tensors[b].at<float> out of range; update
the if condition accordingly and keep the else branch unchanged.

@KKkai0315 KKkai0315 requested a review from oreomaker October 30, 2025 09:35
Removed multiple executable definitions and their associated libraries and include directories from CMakeLists.txt.
@oreomaker oreomaker merged commit b17b316 into UbiquitousLearning:v2 Oct 31, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants