- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.8k
[None][fix] InputProcessor config naming convention fix #8705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 📝 WalkthroughWalkthroughSystematic refactoring across nine model input processor files and the input registry to rename the  Changes
 Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Areas requiring extra attention: 
 Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
 ✅ Passed checks (1 passed)
 ✨ Finishing touches
 🧪 Generate unit tests (beta)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (10)
tensorrt_llm/_torch/models/modeling_hyperclovax.py (4)
679-706: Fix tensor index usage from nonzero; Python slicing requires intsmask.nonzero() yields a 2D tensor; token_len becomes a Tensor and breaks slices. Use as_tuple=True and ints.
Apply:
- mask = (sample == self.config.img_start_id) - img_start_ids = mask.nonzero() + mask = (sample == self.config.img_start_id) + # 1D positions as Python ints + img_start_ids = torch.nonzero(mask, as_tuple=True)[0].tolist() @@ - for multi_img_idx, img_start_idx in enumerate(img_start_ids): - token_len = img_start_idx - temp_start + for multi_img_idx, img_start_idx in enumerate(img_start_ids): + token_len = int(img_start_idx) - temp_start
726-729: Use tokenizer(...) instead of encode(...) with return_tensorsencode() doesn’t accept return_tensors; this will error at runtime.
- input_ids = self.tokenizer.encode(text_prompt, - add_special_tokens=False, - return_tensors="pt") + encoded = self.tokenizer( + text_prompt, add_special_tokens=False, return_tensors="pt" + ) + input_ids = encoded["input_ids"]
650-655: Correct type hints for Python 3.8 and actual types
- Use Dict/Any for 3.8.
- text_prompt is a str, not dict.- def _post_process(self, - input_ids: torch.Tensor, - preprocessed_image: dict[str, any] = None): + def _post_process(self, + input_ids: torch.Tensor, + preprocessed_image: Optional[Dict[str, Any]] = None): @@ - def _preprocess(self, text_prompt: dict[str, any], images: List[Any], - mm_processor_kwargs: Dict[str, Any]): + def _preprocess(self, text_prompt: str, images: Optional[List[Any]], + mm_processor_kwargs: Dict[str, Any]):
As per coding guidelines.
Also applies to: 710-714
1-1: Add 2025 NVIDIA Apache-2.0 headerFile lacks the required current-year header.
Please prepend the standard NVIDIA Apache-2.0 header with year 2025. As per coding guidelines.
tensorrt_llm/_torch/models/modeling_phi4mm.py (1)
1-1: Add 2025 NVIDIA Apache-2.0 headerRequired license header is missing.
Please add the standard header with year 2025. As per coding guidelines.
tensorrt_llm/_torch/models/modeling_mistral.py (1)
1-1: Add 2025 NVIDIA Apache-2.0 headerHeader is required by guidelines.
Please add the standard header with year 2025. As per coding guidelines.
tensorrt_llm/_torch/models/modeling_gemma3vl.py (2)
63-69: Pass device via BatchFeature.to(), not as processor kwargAutoProcessor generally doesn’t accept device=; use .to(device=..., dtype=...).
- processor_output = self.processor( + processor_output = self.processor( text=text_prompt, images=images, do_rescale=do_rescale, return_tensors="pt", - device=self.device).to(dtype=self.dtype) + ).to(device=self.device, dtype=self.dtype)
1-1: Add 2025 NVIDIA Apache-2.0 headerMissing required header.
Add standard header with year 2025. As per coding guidelines.
tensorrt_llm/_torch/models/modeling_llava_next.py (1)
1-1: Add 2025 NVIDIA Apache-2.0 headerPlease prepend the required header.
As per coding guidelines.
tensorrt_llm/_torch/models/modeling_vila.py (1)
1-18: Update header year to 2025Header present but year is 2024; guidelines require current year.
Please update the copyright year to 2025.
🧹 Nitpick comments (9)
tensorrt_llm/_torch/models/modeling_hyperclovax.py (1)
668-669: Guard decoder_max_length when absentAvoid TypeError if config.decoder_max_length is None/missing.
- len_inputs_embeds = min(self.config.decoder_max_length, - len_inputs_embeds) + if getattr(self.config, "decoder_max_length", None) is not None: + len_inputs_embeds = min(self.config.decoder_max_length, + len_inputs_embeds)tensorrt_llm/_torch/models/modeling_phi4mm.py (1)
793-794: Fallback dtype if config.torch_dtype is NoneAvoid downstream dtype issues when configs omit torch_dtype.
- self.dtype = self.config.torch_dtype + self.dtype = getattr(self.config, "torch_dtype", None) or torch.bfloat16As per coding guidelines.
tensorrt_llm/_torch/models/modeling_mistral.py (1)
239-246: Silence unused parameter warningsampling_params is unused but required by interface; mark as intentionally unused.
- def __call__( - self, inputs: TextPrompt, sampling_params: SamplingParams + def __call__( + self, inputs: TextPrompt, sampling_params: SamplingParams # unused ) -> Tuple[List[int], Optional[ExtraProcessedInputs]]:Or rename to _sampling_params. Based on static analysis hints.
tensorrt_llm/_torch/models/modeling_gemma3vl.py (1)
49-51: dtype source OK, consider fallbackIf config.torch_dtype can be None, add a fallback to torch.bfloat16 for stability.
- self.dtype = self.config.torch_dtype + self.dtype = getattr(self.config, "torch_dtype", None) or torch.bfloat16tensorrt_llm/_torch/models/modeling_vila.py (1)
73-76: Fix typo in error messageMinor text nit: “Unsupportede” → “Unsupported”.
- raise ValueError(f"Unsupportede dtype for VILA: {dtype}") + raise ValueError(f"Unsupported dtype for VILA: {dtype}")tensorrt_llm/_torch/models/modeling_llama.py (2)
1047-1065: Expose get_vocab_size for hashing paths.Some infra (multimodal hashing) calls get_vocab_size on the processor. Add a small override to avoid relying on Base fallback to self.model_config.
class Llama4InputProcessor(InputProcessor): @@ def __init__(self, model_path: str, - config: PretrainedConfig, + config: PretrainedConfig, tokenizer: AutoTokenizer, trust_remote_code: bool = True): @@ self.image_token_end_index = self.config.eoi_token_index + + def get_vocab_size(self) -> int: + # Prefer model config, fall back to tokenizer for robustness. + return int(self.config.text_config.vocab_size + if getattr(self.config, "text_config", None) + and getattr(self.config.text_config, "vocab_size", None) is not None + else getattr(self.tokenizer, "vocab_size"))Based on learnings
1159-1167: Avoid decode in the inner loop (token perf/nit).Use tokenizer.convert_ids_to_tokens([token_id])[0] to get the string token without running full decode.
tensorrt_llm/_torch/models/modeling_qwen2vl.py (1)
287-294: get_dummy_text: guard when vocab_size is None.Use same fallback as above to avoid ValueError in np.random.randint.
- ids = np.random.randint( - low=0, - high=int(self.config.vocab_size), # high is exclusive in NumPy + vocab = (self.config.vocab_size + if getattr(self.config, "vocab_size", None) is not None + else getattr(self.tokenizer, "vocab_size")) + ids = np.random.randint( + low=0, + high=int(vocab), # high is exclusive in NumPy size=input_seq_len, ).tolist()tensorrt_llm/inputs/registry.py (1)
470-499: Registry now passes HF config; also update vocab-size resolution to check self.config.Many processors now store self.config only. Extend BaseMultimodalInputProcessor.get_vocab_size to look there before falling back to tokenizer.
class BaseMultimodalInputProcessor: @@ - def get_vocab_size(self) -> Optional[int]: + def get_vocab_size(self) -> Optional[int]: @@ - Resolution order: - 1) self.model_config.vocab_size - 2) self.tokenizer.vocab_size + Resolution order: + 1) self.config.vocab_size (if present) + 2) self.model_config.vocab_size (legacy processors) + 3) self.tokenizer.vocab_size @@ - # 1) Model config - if hasattr(self, 'model_config') and getattr( - self.model_config, 'vocab_size', None) is not None: - return int(self.model_config.vocab_size) + # 1) HF config on processor + if hasattr(self, 'config') and getattr(self.config, 'vocab_size', + None) is not None: + return int(self.config.vocab_size) + # 2) Legacy model_config on processor + if hasattr(self, 'model_config') and getattr(self, 'model_config', None) is not None \ + and getattr(self.model_config, 'vocab_size', None) is not None: + return int(self.model_config.vocab_size) - # 2) Direct tokenizer on self + # 3) Direct tokenizer on self if hasattr(self, 'tokenizer') and getattr(self.tokenizer, 'vocab_size', None) is not None: return int(self.tokenizer.vocab_size)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
- tensorrt_llm/_torch/models/modeling_gemma3vl.py(3 hunks)
- tensorrt_llm/_torch/models/modeling_hyperclovax.py(4 hunks)
- tensorrt_llm/_torch/models/modeling_llama.py(3 hunks)
- tensorrt_llm/_torch/models/modeling_llava_next.py(3 hunks)
- tensorrt_llm/_torch/models/modeling_mistral.py(3 hunks)
- tensorrt_llm/_torch/models/modeling_nanov2vlm.py(2 hunks)
- tensorrt_llm/_torch/models/modeling_phi4mm.py(2 hunks)
- tensorrt_llm/_torch/models/modeling_qwen2vl.py(11 hunks)
- tensorrt_llm/_torch/models/modeling_vila.py(5 hunks)
- tensorrt_llm/inputs/registry.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
- tensorrt_llm/_torch/models/modeling_mistral.py
- tensorrt_llm/_torch/models/modeling_vila.py
- tensorrt_llm/_torch/models/modeling_hyperclovax.py
- tensorrt_llm/_torch/models/modeling_llava_next.py
- tensorrt_llm/_torch/models/modeling_llama.py
- tensorrt_llm/_torch/models/modeling_phi4mm.py
- tensorrt_llm/inputs/registry.py
- tensorrt_llm/_torch/models/modeling_qwen2vl.py
- tensorrt_llm/_torch/models/modeling_gemma3vl.py
- tensorrt_llm/_torch/models/modeling_nanov2vlm.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
- tensorrt_llm/_torch/models/modeling_mistral.py
- tensorrt_llm/_torch/models/modeling_vila.py
- tensorrt_llm/_torch/models/modeling_hyperclovax.py
- tensorrt_llm/_torch/models/modeling_llava_next.py
- tensorrt_llm/_torch/models/modeling_llama.py
- tensorrt_llm/_torch/models/modeling_phi4mm.py
- tensorrt_llm/inputs/registry.py
- tensorrt_llm/_torch/models/modeling_qwen2vl.py
- tensorrt_llm/_torch/models/modeling_gemma3vl.py
- tensorrt_llm/_torch/models/modeling_nanov2vlm.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
- tensorrt_llm/_torch/models/modeling_mistral.py
- tensorrt_llm/_torch/models/modeling_vila.py
- tensorrt_llm/_torch/models/modeling_hyperclovax.py
- tensorrt_llm/_torch/models/modeling_llava_next.py
- tensorrt_llm/_torch/models/modeling_llama.py
- tensorrt_llm/_torch/models/modeling_phi4mm.py
- tensorrt_llm/inputs/registry.py
- tensorrt_llm/_torch/models/modeling_qwen2vl.py
- tensorrt_llm/_torch/models/modeling_gemma3vl.py
- tensorrt_llm/_torch/models/modeling_nanov2vlm.py
🧬 Code graph analysis (10)
tensorrt_llm/_torch/models/modeling_mistral.py (4)
tensorrt_llm/_torch/models/modeling_bert.py (1)
config(266-285)tensorrt_llm/models/modeling_utils.py (1)
PretrainedConfig(369-570)tensorrt_llm/runtime/multimodal_model_runner.py (1)
processor(680-683)tensorrt_llm/inputs/registry.py (2)
get_mm_token_ids(96-110)
get_mm_special_token_ids(112-123)
tensorrt_llm/_torch/models/modeling_vila.py (1)
tensorrt_llm/models/modeling_utils.py (1)
PretrainedConfig(369-570)
tensorrt_llm/_torch/models/modeling_hyperclovax.py (4)
tensorrt_llm/models/modeling_utils.py (1)
PretrainedConfig(369-570)tensorrt_llm/_torch/models/modeling_mistral.py (1)
get_vocab_size(281-285)tensorrt_llm/_torch/models/modeling_nanov2vlm.py (1)
get_vocab_size(298-299)tensorrt_llm/inputs/registry.py (1)
get_vocab_size(74-94)
tensorrt_llm/_torch/models/modeling_llava_next.py (2)
tensorrt_llm/_torch/models/modeling_utils.py (1)
config(522-523)tensorrt_llm/models/modeling_utils.py (1)
PretrainedConfig(369-570)
tensorrt_llm/_torch/models/modeling_llama.py (3)
tensorrt_llm/models/modeling_utils.py (1)
PretrainedConfig(369-570)tensorrt_llm/llmapi/llm.py (2)
tokenizer(743-747)
tokenizer(750-751)tensorrt_llm/_torch/model_config.py (1)
from_pretrained(425-516)
tensorrt_llm/_torch/models/modeling_phi4mm.py (2)
tensorrt_llm/models/modeling_utils.py (1)
PretrainedConfig(369-570)tensorrt_llm/_torch/model_config.py (1)
torch_dtype(217-222)
tensorrt_llm/inputs/registry.py (1)
tensorrt_llm/_torch/models/modeling_utils.py (1)
get_model_architecture(711-723)
tensorrt_llm/_torch/models/modeling_qwen2vl.py (2)
tensorrt_llm/_torch/models/modeling_utils.py (1)
config(522-523)tensorrt_llm/models/modeling_utils.py (1)
PretrainedConfig(369-570)
tensorrt_llm/_torch/models/modeling_gemma3vl.py (4)
tensorrt_llm/models/modeling_utils.py (1)
PretrainedConfig(369-570)tensorrt_llm/llmapi/llm.py (2)
tokenizer(743-747)
tokenizer(750-751)tensorrt_llm/runtime/multimodal_model_runner.py (1)
processor(680-683)tensorrt_llm/_torch/model_config.py (1)
torch_dtype(217-222)
tensorrt_llm/_torch/models/modeling_nanov2vlm.py (5)
tensorrt_llm/models/modeling_utils.py (1)
PretrainedConfig(369-570)tensorrt_llm/_torch/model_config.py (1)
torch_dtype(217-222)tensorrt_llm/_torch/models/modeling_hyperclovax.py (1)
get_vocab_size(592-593)tensorrt_llm/_torch/models/modeling_mistral.py (1)
get_vocab_size(281-285)tensorrt_llm/inputs/registry.py (1)
get_vocab_size(74-94)
🪛 Ruff (0.14.1)
tensorrt_llm/_torch/models/modeling_mistral.py
240-240: Unused method argument: sampling_params
(ARG002)
tensorrt_llm/_torch/models/modeling_vila.py
870-870: Unused method argument: model_path
(ARG002)
873-873: Unused method argument: trust_remote_code
(ARG002)
tensorrt_llm/_torch/models/modeling_phi4mm.py
766-766: Avoid specifying long messages outside the exception class
(TRY003)
tensorrt_llm/_torch/models/modeling_nanov2vlm.py
265-265: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tensorrt_llm/_torch/models/modeling_mistral.py (1)
281-286: Config lookup for vocab size is correctUsing self.config.text_config.vocab_size aligns with Mistral3’s config layout.
tensorrt_llm/_torch/models/modeling_llava_next.py (1)
65-69: Hidden size/vocab lookups correctly moved to configUsing self.config.text_config.hidden_size and self.config.image_token_index is consistent with the model’s config structure.
Ensure all call sites constructing LlavaNextInputProcessor pass a PretrainedConfig with text_config/image_token_index populated.
tensorrt_llm/_torch/models/modeling_llama.py (2)
8-9: LGTM on import adjustments.Types align with constructor changes; no issues.
1124-1129: LGTM on hidden-size validation.Clear error on mismatch; safe guard is correct.
tensorrt_llm/_torch/models/modeling_nanov2vlm.py (1)
259-273: Constructor rename and config-sourced fields look good.Consistent with repo-wide pattern; dtype/ids sourced from self.config correctly.
Also applies to: 288-293
tensorrt_llm/_torch/models/modeling_qwen2vl.py (1)
399-401: LGTM on get_rope_index call-site.Signature and usage now match classmethod.
| /bot run | 
| PR_Github #22743 [ run ] triggered by Bot. Commit:  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR renames model_config by config, having consistent naming convention between VLMs and LLMs. Thanks @yechank-nvidia for this fix.
| PR_Github #22743 [ run ] completed with state  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving mistral changes.
438cf23    to
    4ca83ea      
    Compare
  
    | /bot run | 
| PR_Github #22993 [ run ] triggered by Bot. Commit:  | 
| PR_Github #22993 [ run ] completed with state  | 
Signed-off-by: yechank <[email protected]>
Signed-off-by: yechank <[email protected]>
Signed-off-by: yechank <[email protected]>
Signed-off-by: yechank <[email protected]>
408ff32    to
    49e403f      
    Compare
  
    | /bot run | 
| PR_Github #23130 [ run ] triggered by Bot. Commit:  | 
| PR_Github #23130 [ run ] completed with state  | 
| /bot run | 
| PR_Github #23137 [ run ] triggered by Bot. Commit:  | 
| PR_Github #23137 [ run ] completed with state  | 
| /bot run | 
| PR_Github #23164 [ run ] triggered by Bot. Commit:  | 
| PR_Github #23164 [ run ] completed with state  | 
| /bot run | 
| PR_Github #23196 [ run ] triggered by Bot. Commit:  | 
| PR_Github #23196 [ run ] completed with state  | 
Summary by CodeRabbit
Release Notes