Skip to content

Conversation

lamikr
Copy link

@lamikr lamikr commented Jul 25, 2025

Original patch from saienduri [email protected]

This PR consists of all the changes required to enable PyTorch ROCm CI on MI355X nodes.

  • Rework aotriton cmake configuration to rely on HIP_VERSION instead of ROCM_VERSION as aotriton depnds on hip. Hip loosely track the rocm major version, but the two are not actually synchronized as observed in the ROCm 7 alpha build.
  • Bump composable-kernel submodule to df6023e305f389bbf7249b0c4414e649f3ad6598 for mi350 compatibility.
  • Extend the change docker permissions step to the MI355x runners as well. This step is included to apply the required permission change to the test folder for a successful upload of artifacts in k8s docker.
  • Create new rocm-mi355 workflow to trigger core PyTorch tests on a nightly basis at 2:30 am PST.
  • Successfully tested running the test suites listed in rocm-mi355.yml on MI355 runners by temporarily hacking rocm-mi300.yml: https://hud.pytorch.org/pytorch/pytorch/commit/ca7d5fae112558ee3dde7ec3ce32e94b13f877fd#rocm-mi300

Unlike the original patch, this patch version does not change the __AOTRITON_SHA256_LIST for rocm 6.5. (Change of that would cause sha256 error during the build time)

Fixes #2411

Original patch from saienduri <[email protected]>

This PR consists of all the changes required to enable PyTorch ROCm CI on MI355X nodes.

- Rework aotriton cmake configuration to rely on `HIP_VERSION` instead of `ROCM_VERSION` as aotriton depnds on hip. Hip loosely track the rocm major version, but the two are not actually synchronized as observed in the ROCm 7 alpha build.
- Bump composable-kernel submodule to [df6023e305f389bbf7249b0c4414e649f3ad6598](https://github.com/ROCm/composable_kernel/tree/df6023e305f389bbf7249b0c4414e649f3ad6598) for mi350 compatibility.
- Extend the change docker permissions step to the MI355x runners as well. This step is included to apply the required permission change to the test folder for a successful upload of artifacts in k8s docker.
- Create new rocm-mi355 workflow to trigger core PyTorch tests on a nightly basis at 2:30 am PST.
- Successfully tested running the test suites listed in rocm-mi355.yml on MI355 runners by temporarily hacking rocm-mi300.yml: https://hud.pytorch.org/pytorch/pytorch/commit/ca7d5fae112558ee3dde7ec3ce32e94b13f877fd#rocm-mi300

Unlike the original patch, this version does not change the __AOTRITON_SHA256_LIST
for rocm 6.5. (Change would cause sha256 error after aotriton download)

fixes: ROCm/TheRock#1119

Signed-off-by: Mika Laitio <[email protected]>
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 25, 2025

Jenkins build for a777f31d8b922de52af9a2a55075ca517d901846 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Copy link
Collaborator

@pruthvistony pruthvistony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont think this PR is required to be merged internal fork.

message(STATUS "Using AOTriton compiled from source directory ${__AOTRITON_EXTERN_PREFIX}")
else()
set(__AOTRITON_SYSTEM_ROCM "${ROCM_VERSION_DEV_MAJOR}.${ROCM_VERSION_DEV_MINOR}")
set(__AOTRITON_SYSTEM_ROCM "${HIP_VERSION_MAJOR}.${HIP_VERSION_MINOR}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this change is required.
Have been moving all codes to ROCM_VERSION check away from HIP_VERSION checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants