Skip to content

Conversation

@Barry-Delaney
Copy link
Collaborator

@Barry-Delaney Barry-Delaney commented Oct 28, 2025

Summary by CodeRabbit

  • Tests
    • Enhanced test coverage for Blackwell GPU support with dynamic configuration based on hardware capabilities.
    • Tests now conditionally adjust performance parameters based on detected GPU version, enabling broader hardware compatibility testing.

@Barry-Delaney Barry-Delaney self-assigned this Oct 28, 2025
@Barry-Delaney Barry-Delaney requested a review from a team as a code owner October 28, 2025 05:27
@Barry-Delaney Barry-Delaney changed the title [https://nvbugspro.nvidia.com/bug/5325296][fix] Enable relaxed acceptance test on Blackwell [https://nvbugs/5325296][fix] Enable relaxed acceptance test on Blackwell Oct 28, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2025

📝 Walkthrough

Walkthrough

The changes modify integration tests to dynamically adjust configuration based on GPU compute capability (SM version). A decorator is removed and conditional logic is introduced to set KV cache fraction and MOE backend differently for Blackwell-generation GPUs versus earlier generations.

Changes

Cohort / File(s) Summary
Test Configuration Updates
tests/integration/defs/test_e2e.py
Import get_sm_version from conftest; remove @skip_post_blackwell decorator from test_ptp_quickstart_advanced_deepseek_r1_8gpus; introduce is_blackwell variable in test_relaxed_acceptance_quickstart_advanced_deepseek_r1_8gpus to conditionally set kv_cache_fraction (_MEM_FRACTION_80 for Blackwell, _MEM_FRACTION_95 otherwise) and moe_backend ("DEEPGEMM" for Blackwell, "CUTLASS" otherwise).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

  • Straightforward import addition and conditional configuration logic
  • Single file with homogeneous, repetitive pattern
  • No complex logic changes, only test parameter adjustments

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is incomplete and does not follow the required template structure. Instead of providing the mandatory sections (Description, Test Coverage, and PR Checklist), the author has only included "@coderabbitai summary" which appears to be a placeholder or command rather than actual content. The description lacks explanation of the issue being fixed, the solution implemented, relevant test coverage information, and completion of the checklist items required by the repository template. The author should provide a complete pull request description by filling in all required sections: (1) a clear explanation of the issue (why the test was skipped on Blackwell) and the solution (how SM version detection enables the test), (2) identification of the specific test cases affected (test_relaxed_acceptance_quickstart_advanced_deepseek_r1_8gpus and test_ptp_quickstart_advanced_deepseek_r1_8gpus), and (3) completion of the PR checklist verifying compliance with coding guidelines, testing requirements, and documentation standards.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title follows the required template format with a valid NVBugs ticket identifier, appropriate type classification, and a clear summary. The title "Enable relaxed acceptance test on Blackwell" accurately reflects the main changes: removing the skip decorator from the test and introducing dynamic behavior based on SM version detection to support Blackwell hardware. The title is concise, specific, and provides sufficient context for understanding the primary change in the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 28c9a51 and 34cd672.

📒 Files selected for processing (1)
  • tests/integration/defs/test_e2e.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tests/integration/defs/test_e2e.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tests/integration/defs/test_e2e.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tests/integration/defs/test_e2e.py
🧬 Code graph analysis (1)
tests/integration/defs/test_e2e.py (1)
tests/integration/defs/conftest.py (1)
  • get_sm_version (1870-1873)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tests/integration/defs/test_e2e.py (2)

34-37: LGTM! Import added correctly.

The get_sm_version import is properly added to support the dynamic GPU detection logic introduced in this PR.


2210-2210: Conditional configuration logic looks correct.

The dynamic configuration based on GPU architecture is well-implemented:

  • KV cache fraction adjusts for Blackwell's memory characteristics
  • MOE backend selection matches the appropriate compute capability

However, these depend on the correct is_blackwell detection (see comment on line 2196).

Also applies to: 2223-2223

Signed-off-by: Barry Kang <[email protected]>
@Barry-Delaney
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22738 [ run ] triggered by Bot. Commit: 57f5ba5

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22738 [ run ] completed with state SUCCESS. Commit: 57f5ba5
/LLM/release-1.1/L0_MergeRequest_PR pipeline #287 completed with status: 'SUCCESS'

@Barry-Delaney Barry-Delaney force-pushed the user/barry/fix_sm100_r1_ci branch from 6744876 to 1778fef Compare October 29, 2025 01:42
@Barry-Delaney
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22822 [ run ] triggered by Bot. Commit: 43c4ea1

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22822 [ run ] completed with state SUCCESS. Commit: 43c4ea1
/LLM/release-1.1/L0_MergeRequest_PR pipeline #295 completed with status: 'FAILURE'

@Barry-Delaney
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22854 [ run ] triggered by Bot. Commit: f0363ca

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22854 [ run ] completed with state SUCCESS. Commit: f0363ca
/LLM/release-1.1/L0_MergeRequest_PR pipeline #300 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@Barry-Delaney Barry-Delaney requested a review from kaiyux October 30, 2025 05:47
@Barry-Delaney Barry-Delaney enabled auto-merge (squash) October 31, 2025 08:32
@Barry-Delaney
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23170 [ run ] triggered by Bot. Commit: 04de7da

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23170 [ run ] completed with state FAILURE. Commit: 04de7da
/LLM/release-1.1/L0_MergeRequest_PR pipeline #336 completed with status: 'FAILURE'

@Barry-Delaney
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23218 [ run ] triggered by Bot. Commit: f077b6d

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23218 [ run ] completed with state SUCCESS. Commit: f077b6d
/LLM/release-1.1/L0_MergeRequest_PR pipeline #346 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@Barry-Delaney Barry-Delaney merged commit f22a87f into NVIDIA:release/1.1 Oct 31, 2025
5 checks passed
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 4, 2025
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 4, 2025
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 4, 2025
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 4, 2025
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 5, 2025
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 6, 2025
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants