Skip to content

Comments

2026.01.22 initial rocm presets#541

Open
dbarry9 wants to merge 2 commits intoicl-utk-edu:masterfrom
dbarry9:2026.01.22_initial-rocm-presets
Open

2026.01.22 initial rocm presets#541
dbarry9 wants to merge 2 commits intoicl-utk-edu:masterfrom
dbarry9:2026.01.22_initial-rocm-presets

Conversation

@dbarry9
Copy link
Contributor

@dbarry9 dbarry9 commented Jan 23, 2026

Pull Request Description

This pull request adds the following to the rocm component:

  • functions to facilitate ROCm presets, and
  • presets for FMA instructions of various precisions.

These changes have been tested on the AMD MI250X architecture.

Here is the relevant output from papi_avail on the Frontier supercomputer:

================================================================================
  PAPI Component Preset Events
================================================================================
       Name           Code    Avail Deriv Description (Note)
PAPI_ROCM_FP16_FMA 0x80000080  Yes   No   ROCM Half precision (FP16) FMA instructions
    :device=0
        Mandatory device qualifier [0,1,2,3,4,5,6,7]
--------------------------------------------------------------------------------
PAPI_ROCM_BF16_FMA 0x80000081  No    No   ROCM Half precision (BF16) FMA instructions
--------------------------------------------------------------------------------
PAPI_ROCM_FP32_FMA 0x80000082  Yes   No   ROCM Single precision (FP32) FMA instructions
    :device=0
        Mandatory device qualifier [0,1,2,3,4,5,6,7]
--------------------------------------------------------------------------------
PAPI_ROCM_FP64_FMA 0x80000083  Yes   No   ROCM Double precision (FP64) FMA instructions
    :device=0
        Mandatory device qualifier [0,1,2,3,4,5,6,7]
--------------------------------------------------------------------------------
PAPI_ROCM_FP_FMA   0x80000084  Yes   Yes  ROCM floating-point FMA instructions
    :device=0
        Mandatory device qualifier [0,1,2,3,4,5,6,7]
--------------------------------------------------------------------------------
PAPI_ROCM_FP8_OPS  0x80000085  No    No   ROCM 8-bit precision floating-point operations
================================================================================

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@dbarry9 dbarry9 added component-rocm PRs and Issues related to the rocm component update-presets PRs related to updating the PAPI presets labels Jan 23, 2026
Add functions to facilitate ROCm presets.

These changes have been tested on the AMD MI250X architecture.
Add presets for FMA instructions of various precisions.

These changes have been tested on the AMD MI250X architecture.
@dbarry9 dbarry9 force-pushed the 2026.01.22_initial-rocm-presets branch from df51463 to a0800b1 Compare January 23, 2026 01:59
# AMD MI250X architecture
rocm,gfx90a
#
PRESET,PAPI_ROCM_FP16_FMA,NOT_DERIVED,rocm:::SQ_INSTS_VALU_FMA_F16:device=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: In papi_rocm_presets.h you have 6 listed events, but here only 4 are defined. Are the other two for a separate architecture?

#
PRESET,PAPI_ROCM_FP16_FMA,NOT_DERIVED,rocm:::SQ_INSTS_VALU_FMA_F16:device=0
PRESET,PAPI_ROCM_FP32_FMA,NOT_DERIVED,rocm:::SQ_INSTS_VALU_FMA_F32:device=0
PRESET,PAPI_ROCM_FP64_FMA,NOT_DERIVED,rocm:::SQ_INSTS_VALU_FMA_F64:device=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: It appears that none of the events listed will need an instance qualifier appended to them. For the cuda component, you show the stats qualifier as well in the papi_avail output as those events do need the stats qualifier.

Say for a future case you add an event which does require an instance value, is that already handled with this update or would more work need to be done?

hsa_status_t
get_chip_name( hsa_agent_t agent, void *data ) {

char *name = (char *) data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would personally be more descriptive with the variable naming not just here, but in other instances as well.

name should correspond to the arch_name, but it is not immediately clear this is what we actually mean. Long term, I believe it would help others that come along and work on the component that may not be as familiar with it.

}

if (type == HSA_DEVICE_TYPE_GPU) {
hsa_errno = hsa_agent_get_info_p(agent, HSA_AGENT_INFO_NAME, name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs HSA_AGENT_INFO_NAME is a null terminated char[64]. The variable you are passing is arch_name[PAPI_2MAX_STR_LEN] ([256]), I would swap to PAPI_MIN_STR_LEN.

* be available. */
int retval = rocd_get_chip_name(arch_name);
if ( retval != PAPI_OK ) {
SUBDBG("EXIT: Failed to get ROCm device name.\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should ROCm not be AMD here?

int
rocc_get_chip_name(char *name) {

hsa_status_t hsa_errno = hsa_iterate_agents_p(&get_chip_name, name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: From my understanding, hsa_iterate_agents will enumerate over the available agents and invoke the callback you have defined. Therefore, if we have 7 AMD devices this will iterate over all 7 calling your callback function. What if there are multiple mixed devices, i.e. MI250x and MI300A? From my testing you would only capture the very last one.

/* Setup presets. */
char arch_name[PAPI_2MAX_STR_LEN];

/* Load preset table for every device type available on the system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would change the comments from /* to // as then it becomes easier to comment blocks of code out later if you are debugging.

However, you can use #if 0 #endif.

This is something we should probably discuss internally as a team though.

#ifndef __PAPI_ROCM_PRESETS_H__
#define __PAPI_ROCM_PRESETS_H__

hwi_presets_t _rocm_presets[PAPI_MAX_rocm_PRESETS] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I use PAPI_get_event_info with either the preset rocm define or the preset rocm event code the member variable component_index from the structure PAPI_event_info_t shows a value of 0. This should not be the case for the rocm component.

/* 0 */ {"PAPI_ROCM_FP16_FMA",
"ROCM FP16 FMA instr",
"ROCM Half precision (FP16) FMA instructions", 0,
0, PAPI_PRESET_BIT_MSC,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Frontier, when I call PAPI_enum_event with the modifier PAPI_PRESET_ENUM_AVAIL I do not see PAPI_ROCM_FP16_FMA output even though this preset is available on the architecture. When I used the modifier PAPI_ENUM_EVENTS it did appear in the output.

}

// Setup the presets.
papi_errno = rocm_init_comp_presets();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If you add a SUBDBG to this I could see the nested approach; however, one option would be to:

papi_errno = rocm_init_comp_presets();
if (papi_errno == PAPI_ENOIMPL) {
    papi_errno = PAPI_OK;
}

return papi_errno;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component-rocm PRs and Issues related to the rocm component update-presets PRs related to updating the PAPI presets

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants