[https://nvbugs/5325296][fix] Enable relaxed acceptance test on Blackwell #8709

Barry-Delaney · 2025-10-28T05:27:02Z

Summary by CodeRabbit

Tests
- Enhanced test coverage for Blackwell GPU support with dynamic configuration based on hardware capabilities.
- Tests now conditionally adjust performance parameters based on detected GPU version, enabling broader hardware compatibility testing.

Signed-off-by: Barry Kang <[email protected]>

coderabbitai · 2025-10-28T05:29:58Z

📝 Walkthrough

Walkthrough

The changes modify integration tests to dynamically adjust configuration based on GPU compute capability (SM version). A decorator is removed and conditional logic is introduced to set KV cache fraction and MOE backend differently for Blackwell-generation GPUs versus earlier generations.

Changes

Cohort / File(s)	Summary
Test Configuration Updates `tests/integration/defs/test_e2e.py`	Import `get_sm_version` from conftest; remove `@skip_post_blackwell` decorator from `test_ptp_quickstart_advanced_deepseek_r1_8gpus`; introduce `is_blackwell` variable in `test_relaxed_acceptance_quickstart_advanced_deepseek_r1_8gpus` to conditionally set `kv_cache_fraction` (_MEM_FRACTION_80 for Blackwell, _MEM_FRACTION_95 otherwise) and `moe_backend` ("DEEPGEMM" for Blackwell, "CUTLASS" otherwise).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Straightforward import addition and conditional configuration logic
Single file with homogeneous, repetitive pattern
No complex logic changes, only test parameter adjustments

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is incomplete and does not follow the required template structure. Instead of providing the mandatory sections (Description, Test Coverage, and PR Checklist), the author has only included "@coderabbitai summary" which appears to be a placeholder or command rather than actual content. The description lacks explanation of the issue being fixed, the solution implemented, relevant test coverage information, and completion of the checklist items required by the repository template.	The author should provide a complete pull request description by filling in all required sections: (1) a clear explanation of the issue (why the test was skipped on Blackwell) and the solution (how SM version detection enables the test), (2) identification of the specific test cases affected (test_relaxed_acceptance_quickstart_advanced_deepseek_r1_8gpus and test_ptp_quickstart_advanced_deepseek_r1_8gpus), and (3) completion of the PR checklist verifying compliance with coding guidelines, testing requirements, and documentation standards.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title follows the required template format with a valid NVBugs ticket identifier, appropriate type classification, and a clear summary. The title "Enable relaxed acceptance test on Blackwell" accurately reflects the main changes: removing the skip decorator from the test and introducing dynamic behavior based on SM version detection to support Blackwell hardware. The title is concise, specific, and provides sufficient context for understanding the primary change in the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 28c9a51 and 34cd672.

📒 Files selected for processing (1)

tests/integration/defs/test_e2e.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tests/integration/defs/test_e2e.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tests/integration/defs/test_e2e.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tests/integration/defs/test_e2e.py

🧬 Code graph analysis (1)

tests/integration/defs/test_e2e.py (1)

tests/integration/defs/conftest.py (1)

get_sm_version (1870-1873)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (2)

tests/integration/defs/test_e2e.py (2)

34-37: LGTM! Import added correctly.

The get_sm_version import is properly added to support the dynamic GPU detection logic introduced in this PR.

2210-2210: Conditional configuration logic looks correct.

The dynamic configuration based on GPU architecture is well-implemented:

KV cache fraction adjusts for Blackwell's memory characteristics

MOE backend selection matches the appropriate compute capability

However, these depend on the correct is_blackwell detection (see comment on line 2196).

Also applies to: 2223-2223

tests/integration/defs/test_e2e.py

Signed-off-by: Barry Kang <[email protected]>

Barry-Delaney · 2025-10-28T06:19:22Z

/bot run

tensorrt-cicd · 2025-10-28T06:25:05Z

PR_Github #22738 [ run ] triggered by Bot. Commit: 57f5ba5

tensorrt-cicd · 2025-10-28T09:32:21Z

PR_Github #22738 [ run ] completed with state SUCCESS. Commit: 57f5ba5
/LLM/release-1.1/L0_MergeRequest_PR pipeline #287 completed with status: 'SUCCESS'

Signed-off-by: Barry Kang <[email protected]>

Barry-Delaney · 2025-10-29T01:44:29Z

/bot run

tensorrt-cicd · 2025-10-29T01:49:58Z

PR_Github #22822 [ run ] triggered by Bot. Commit: 43c4ea1

tensorrt-cicd · 2025-10-29T03:53:24Z

PR_Github #22822 [ run ] completed with state SUCCESS. Commit: 43c4ea1
/LLM/release-1.1/L0_MergeRequest_PR pipeline #295 completed with status: 'FAILURE'

Barry-Delaney · 2025-10-29T06:02:10Z

/bot run

tensorrt-cicd · 2025-10-29T06:07:35Z

PR_Github #22854 [ run ] triggered by Bot. Commit: f0363ca

tensorrt-cicd · 2025-10-29T09:15:17Z

PR_Github #22854 [ run ] completed with state SUCCESS. Commit: f0363ca
/LLM/release-1.1/L0_MergeRequest_PR pipeline #300 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Barry-Delaney · 2025-10-31T08:32:53Z

/bot run

tensorrt-cicd · 2025-10-31T08:38:13Z

PR_Github #23170 [ run ] triggered by Bot. Commit: 04de7da

tensorrt-cicd · 2025-10-31T09:19:37Z

PR_Github #23170 [ run ] completed with state FAILURE. Commit: 04de7da
/LLM/release-1.1/L0_MergeRequest_PR pipeline #336 completed with status: 'FAILURE'

Barry-Delaney · 2025-10-31T18:02:53Z

/bot run

tensorrt-cicd · 2025-10-31T18:08:33Z

PR_Github #23218 [ run ] triggered by Bot. Commit: f077b6d

tensorrt-cicd · 2025-10-31T22:02:05Z

PR_Github #23218 [ run ] completed with state SUCCESS. Commit: f077b6d
/LLM/release-1.1/L0_MergeRequest_PR pipeline #346 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

…well (NVIDIA#8709) Signed-off-by: Barry Kang <[email protected]>

…well (NVIDIA#8709) Signed-off-by: Barry Kang <[email protected]> Signed-off-by: Mike Iovine <[email protected]>

Enable relaxed acceptance test on Blackwell

34cd672

Signed-off-by: Barry Kang <[email protected]>

Barry-Delaney self-assigned this Oct 28, 2025

Barry-Delaney requested a review from a team as a code owner October 28, 2025 05:27

Barry-Delaney changed the title ~~[https://nvbugspro.nvidia.com/bug/5325296][fix] Enable relaxed acceptance test on Blackwell~~ [https://nvbugs/5325296][fix] Enable relaxed acceptance test on Blackwell Oct 28, 2025

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

tests/integration/defs/test_e2e.py Show resolved Hide resolved

Fix format

57f5ba5

Signed-off-by: Barry Kang <[email protected]>

Lower KV fraction for Blackwell

1778fef

Signed-off-by: Barry Kang <[email protected]>

Barry-Delaney force-pushed the user/barry/fix_sm100_r1_ci branch from 6744876 to 1778fef Compare October 29, 2025 01:42

Merge branch 'release/1.1' into user/barry/fix_sm100_r1_ci

43c4ea1

Merge branch 'release/1.1' into user/barry/fix_sm100_r1_ci

f0363ca

Barry-Delaney requested a review from kaiyux October 30, 2025 05:47

Barry-Delaney added 2 commits October 30, 2025 13:47

Merge branch 'release/1.1' into user/barry/fix_sm100_r1_ci

2673902

Merge branch 'release/1.1' into user/barry/fix_sm100_r1_ci

04de7da

kaiyux approved these changes Oct 31, 2025

View reviewed changes

Barry-Delaney enabled auto-merge (squash) October 31, 2025 08:32

Merge branch 'release/1.1' into user/barry/fix_sm100_r1_ci

f077b6d

Barry-Delaney merged commit f22a87f into NVIDIA:release/1.1 Oct 31, 2025
5 checks passed

mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 4, 2025

[https://nvbugs/5325296][fix] Enable relaxed acceptance test on Black…

cf6a3cb

…well (NVIDIA#8709) Signed-off-by: Barry Kang <[email protected]>

[https://nvbugs/5325296][fix] Enable relaxed acceptance test on Blackwell #8709

[https://nvbugs/5325296][fix] Enable relaxed acceptance test on Blackwell #8709

Uh oh!

Conversation

Barry-Delaney commented Oct 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 28, 2025

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Barry-Delaney commented Oct 28, 2025

Uh oh!

tensorrt-cicd commented Oct 28, 2025

Uh oh!

tensorrt-cicd commented Oct 28, 2025

Uh oh!

Barry-Delaney commented Oct 29, 2025

Uh oh!

tensorrt-cicd commented Oct 29, 2025

Uh oh!

tensorrt-cicd commented Oct 29, 2025

Uh oh!

Barry-Delaney commented Oct 29, 2025

Uh oh!

tensorrt-cicd commented Oct 29, 2025

Uh oh!

tensorrt-cicd commented Oct 29, 2025

Uh oh!

Barry-Delaney commented Oct 31, 2025

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

Barry-Delaney commented Oct 31, 2025

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Barry-Delaney commented Oct 28, 2025 •

edited by coderabbitai bot

Loading