Conversation
Add functions to facilitate ROCm presets. These changes have been tested on the AMD MI250X architecture.
Add presets for FMA instructions of various precisions. These changes have been tested on the AMD MI250X architecture.
df51463 to
a0800b1
Compare
| # AMD MI250X architecture | ||
| rocm,gfx90a | ||
| # | ||
| PRESET,PAPI_ROCM_FP16_FMA,NOT_DERIVED,rocm:::SQ_INSTS_VALU_FMA_F16:device=0 |
There was a problem hiding this comment.
Q: In papi_rocm_presets.h you have 6 listed events, but here only 4 are defined. Are the other two for a separate architecture?
| # | ||
| PRESET,PAPI_ROCM_FP16_FMA,NOT_DERIVED,rocm:::SQ_INSTS_VALU_FMA_F16:device=0 | ||
| PRESET,PAPI_ROCM_FP32_FMA,NOT_DERIVED,rocm:::SQ_INSTS_VALU_FMA_F32:device=0 | ||
| PRESET,PAPI_ROCM_FP64_FMA,NOT_DERIVED,rocm:::SQ_INSTS_VALU_FMA_F64:device=0 |
There was a problem hiding this comment.
Q: It appears that none of the events listed will need an instance qualifier appended to them. For the cuda component, you show the stats qualifier as well in the papi_avail output as those events do need the stats qualifier.
Say for a future case you add an event which does require an instance value, is that already handled with this update or would more work need to be done?
| hsa_status_t | ||
| get_chip_name( hsa_agent_t agent, void *data ) { | ||
|
|
||
| char *name = (char *) data; |
There was a problem hiding this comment.
nit: I would personally be more descriptive with the variable naming not just here, but in other instances as well.
name should correspond to the arch_name, but it is not immediately clear this is what we actually mean. Long term, I believe it would help others that come along and work on the component that may not be as familiar with it.
| } | ||
|
|
||
| if (type == HSA_DEVICE_TYPE_GPU) { | ||
| hsa_errno = hsa_agent_get_info_p(agent, HSA_AGENT_INFO_NAME, name); |
There was a problem hiding this comment.
From the docs HSA_AGENT_INFO_NAME is a null terminated char[64]. The variable you are passing is arch_name[PAPI_2MAX_STR_LEN] ([256]), I would swap to PAPI_MIN_STR_LEN.
| * be available. */ | ||
| int retval = rocd_get_chip_name(arch_name); | ||
| if ( retval != PAPI_OK ) { | ||
| SUBDBG("EXIT: Failed to get ROCm device name.\n"); |
There was a problem hiding this comment.
nit: Should ROCm not be AMD here?
| int | ||
| rocc_get_chip_name(char *name) { | ||
|
|
||
| hsa_status_t hsa_errno = hsa_iterate_agents_p(&get_chip_name, name); |
There was a problem hiding this comment.
Q: From my understanding, hsa_iterate_agents will enumerate over the available agents and invoke the callback you have defined. Therefore, if we have 7 AMD devices this will iterate over all 7 calling your callback function. What if there are multiple mixed devices, i.e. MI250x and MI300A? From my testing you would only capture the very last one.
| /* Setup presets. */ | ||
| char arch_name[PAPI_2MAX_STR_LEN]; | ||
|
|
||
| /* Load preset table for every device type available on the system. |
There was a problem hiding this comment.
nit: I would change the comments from /* to // as then it becomes easier to comment blocks of code out later if you are debugging.
However, you can use #if 0 #endif.
This is something we should probably discuss internally as a team though.
| #ifndef __PAPI_ROCM_PRESETS_H__ | ||
| #define __PAPI_ROCM_PRESETS_H__ | ||
|
|
||
| hwi_presets_t _rocm_presets[PAPI_MAX_rocm_PRESETS] = { |
There was a problem hiding this comment.
When I use PAPI_get_event_info with either the preset rocm define or the preset rocm event code the member variable component_index from the structure PAPI_event_info_t shows a value of 0. This should not be the case for the rocm component.
| /* 0 */ {"PAPI_ROCM_FP16_FMA", | ||
| "ROCM FP16 FMA instr", | ||
| "ROCM Half precision (FP16) FMA instructions", 0, | ||
| 0, PAPI_PRESET_BIT_MSC, |
There was a problem hiding this comment.
On Frontier, when I call PAPI_enum_event with the modifier PAPI_PRESET_ENUM_AVAIL I do not see PAPI_ROCM_FP16_FMA output even though this preset is available on the architecture. When I used the modifier PAPI_ENUM_EVENTS it did appear in the output.
| } | ||
|
|
||
| // Setup the presets. | ||
| papi_errno = rocm_init_comp_presets(); |
There was a problem hiding this comment.
nit: If you add a SUBDBG to this I could see the nested approach; however, one option would be to:
papi_errno = rocm_init_comp_presets();
if (papi_errno == PAPI_ENOIMPL) {
papi_errno = PAPI_OK;
}
return papi_errno;
Pull Request Description
This pull request adds the following to the
rocmcomponent:These changes have been tested on the AMD MI250X architecture.
Here is the relevant output from
papi_availon the Frontier supercomputer:Author Checklist
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits are self contained and only do one thing
Commits have a header of the form:
module: short descriptionCommits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
The PR needs to pass all the tests