rocp_sdk: enable delayed initialization#533
rocp_sdk: enable delayed initialization#533dbarry9 wants to merge 1 commit intoicl-utk-edu:masterfrom
Conversation
348ca81 to
bf0276f
Compare
Revert the specific change in commit 258caab that removed delayed initialization from the rocp_sdk component. This functionality is convenient for tools and applications which use PAPI internally. These changes have been tested on the Frontier supercomputer, which contains the AMD MI250X architecture.
bf0276f to
539b198
Compare
|
I am reviewing this PR. |
| // This component needs to be fully initialized from the beginning, | ||
| // because interleaving hip calls and PAPI calls leads to errors. | ||
| return check_n_initialize(); | ||
| sprintf(_rocp_sdk_vector.cmp_info.disabled_reason, "Not initialized. Access component events to initialize it."); |
There was a problem hiding this comment.
Can you change sprintf to snprintf and then check the returned value, i.e.
int strLen = snprintf(_rocp_sdk_vector.cmp_info.disabled_reason, PAPI_HUGE_STR_LEN, "%s", "Not initialized. Access component events to initialize it.");
if (strLen < 0 || strLen >= PAPI_HUGE_STR_LEN) {
SUBDBG(...)
return PAPI_EBUF;
}
|
|
||
| * In dispatch mode, PAPI may read zeros if reading takes place immediately after the return of a GPU kernel. This is not a PAPI bug. It may occur because calls such as hipDeviceSynchronize() do not guarantee that ROCprofiler has been called and all counter buffers have been flushed. Therefore, it is recommended that the user code adds a delay between the return of a kernel and calls to PAPI_read(), PAPI_stop(), etc. | ||
| * If an application is linked against the static PAPI library libpapi.a, then the application must call PAPI_library_init() before calling any hip routines (e.g. hipInit(), hipGetDeviceCount(), hipLaunchKernelGGL(), etc). If the application is linked against the dynamic library libpapi.so, then the order of operations does not matter. | ||
| * If an application is linked against the static PAPI library libpapi.a, then the application must call PAPI_library_init() through PAPI_add_named_event()/PAPI_add_event() before calling any hip routines (e.g. hipInit(), hipGetDeviceCount(), hipLaunchKernelGGL(), etc). If the application is linked against the dynamic library libpapi.so, then the order of operations does not matter. |
There was a problem hiding this comment.
I assume the added "through PAPI_add_named_event()/PAPI_add_event()" was added due to the PAPI_EDELAY_INIT support and doing this workflow would initialize the component.
However, you can also call PAPI_enum_cmp_event(int *EventCode, PAPI_ENUM_FIRST, int cidx) to initialize the component. I would make that aware to users as well.
| sprintf(_rocp_sdk_vector.cmp_info.disabled_reason, "Not initialized. Access component events to initialize it."); | ||
| _rocp_sdk_vector.cmp_info.disabled = PAPI_EDELAY_INIT; | ||
| return PAPI_EDELAY_INIT; | ||
| } |
There was a problem hiding this comment.
From my testing, if I have the following workflow:
PAPI_library_init(PAPI_VER_CURRENT);
int cidx = API_get_component_index("rocp_sdk");
int EventCode = 0 | PAPI_NATIVE_MASK;
PAPI_enum_cmp_event(&EventCode, PAPI_ENUM_FIRST, cidx);
const PAPI_component_info_t *cmpinfo = PAPI_get_component_info(cidx);
The member variable of PAPI_component_info_t disabled_reason will still show "Not initialized. Access component events to initialize it." This should not be the case anymore as I have initialized rocp_sdk.
To resolve this you need to set disabled_reason to an empty string inside rocp_sdk_init_private (this is what I did in the cuda component see here).
|
In The call to |
Pull Request Description
Revert the specific change in commit 258caab that removed delayed initialization from the rocp_sdk component.
This functionality is convenient for tools and applications which use PAPI internally.
These changes have been tested on the Frontier supercomputer, which contains the AMD MI250X architecture.
Author Checklist
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits are self contained and only do one thing
Commits have a header of the form:
module: short descriptionCommits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
The PR needs to pass all the tests