Skip to content

ROCP_SDK: Enable multi-GPU functionality.#562

Merged
Treece-Burgess merged 1 commit intoicl-utk-edu:masterfrom
adanalis-amd:2026.02.rocp_sdk-multi_device
Feb 13, 2026
Merged

ROCP_SDK: Enable multi-GPU functionality.#562
Treece-Burgess merged 1 commit intoicl-utk-edu:masterfrom
adanalis-amd:2026.02.rocp_sdk-multi_device

Conversation

@adanalis-amd
Copy link
Collaborator

ROCm-7.2 encodes the device_id in the upper 32 bits of the counter id, so this comparison would fail for all GPUs except zero. Masking off the device_id by comparing only the lower 32 bits solves the problem.
Earlier versions of ROCm do not have this behavior, so the problem does not appear, and it will also be removed from future versions. This fix addresses a narrow range of versions for which this problem is present.

Pull Request Description

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@adanalis-amd adanalis-amd force-pushed the 2026.02.rocp_sdk-multi_device branch from 660779f to e872b19 Compare February 13, 2026 00:27
Copy link
Contributor

@Treece-Burgess Treece-Burgess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR on Odyssey at Oregon with ROCm 7.2.0 pre-release (and ROCm 7.1.1 as a sanity test) with the following configure:

./configure --prefix=$PWD/test-install --with-components="rocp_sdk" --with-debug=yes

The results were:

  • PAPI build: ✅
  • PAPI utilities*: ✅
  • rocp_sdk component tests: ✅

* papi_component_avail, papi_native_avail, and papi_command_line

Note that this PR resolves the test two_eventsets.c returning 0 values which began in ROCm 7.2.0:

# two_eventsets.c with ROCm 7.1.1 (master branch)
==================== FIRST EVENTSET - DEVICE 1 ====================
---------------------  PAPI_read()
rocp_sdk:::SQ_BUSY_CYCLES:device=0: 0 (0.00)
rocp_sdk:::SQ_BUSY_CYCLES:device=1: 2433452209 (1.87)
rocp_sdk:::TCC_CYCLE:device=1: 236585347099 (4.30)
rocp_sdk:::SQ_WAVES:device=0: 0 (0.00)
rocp_sdk:::SQ_WAVES:device=1: 5 (5.00)
---------------------  PAPI_read()
rocp_sdk:::SQ_BUSY_CYCLES:device=0: 0 (0.00)
rocp_sdk:::SQ_BUSY_CYCLES:device=1: 5068995890 (1.95)
rocp_sdk:::TCC_CYCLE:device=1: 490618431337 (4.46)
rocp_sdk:::SQ_WAVES:device=0: 0 (0.00)
rocp_sdk:::SQ_WAVES:device=1: 10 (5.00)
.
.
.

# two_eventsets.c with ROCm 7.2.0 (master branch)
==================== FIRST EVENTSET - DEVICE 1 ====================
---------------------  PAPI_read()
rocp_sdk:::SQ_BUSY_CYCLES:device=0: 0 (0.00)
rocp_sdk:::SQ_BUSY_CYCLES:device=1: 0 (0.00)
rocp_sdk:::TCC_CYCLE:device=1: 0 (0.00)
rocp_sdk:::SQ_WAVES:device=0: 0 (0.00)
rocp_sdk:::SQ_WAVES:device=1: 0 (0.00)
---------------------  PAPI_read()
rocp_sdk:::SQ_BUSY_CYCLES:device=0: 0 (0.00)
rocp_sdk:::SQ_BUSY_CYCLES:device=1: 0 (0.00)
rocp_sdk:::TCC_CYCLE:device=1: 0 (0.00)
rocp_sdk:::SQ_WAVES:device=0: 0 (0.00)
rocp_sdk:::SQ_WAVES:device=1: 0 (0.00)
.
.
.

# two_eventsets.c with ROCm 7.2.0 (this branch)
==================== FIRST EVENTSET - DEVICE 1 ====================
---------------------  PAPI_read()
rocp_sdk:::SQ_BUSY_CYCLES:device=0: 0 (0.00)
rocp_sdk:::SQ_BUSY_CYCLES:device=1: 2529921993 (1.95)
rocp_sdk:::TCC_CYCLE:device=1: 245688656588 (4.47)
rocp_sdk:::SQ_WAVES:device=0: 0 (0.00)
rocp_sdk:::SQ_WAVES:device=1: 5 (5.00)
---------------------  PAPI_read()
rocp_sdk:::SQ_BUSY_CYCLES:device=0: 0 (0.00)
rocp_sdk:::SQ_BUSY_CYCLES:device=1: 5294180824 (2.04)
rocp_sdk:::TCC_CYCLE:device=1: 512793339818 (4.66)
rocp_sdk:::SQ_WAVES:device=0: 0 (0.00)
rocp_sdk:::SQ_WAVES:device=1: 12 (6.00)
.
.
.

Rocm-7.2 encodes the device_id in the upper 32 bits of the counter id, so this comparison would fail for all GPUs except zero. Masking off the device_id by comparing only the lower 32 bits solves the problem.
@Treece-Burgess Treece-Burgess force-pushed the 2026.02.rocp_sdk-multi_device branch from e872b19 to 9e948bb Compare February 13, 2026 23:20
@Treece-Burgess Treece-Burgess merged commit 525920d into icl-utk-edu:master Feb 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants