-
Notifications
You must be signed in to change notification settings - Fork 767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UR][CUDA][HIP] Allow adapter objects to block full adapter teardown. #17571
base: sycl
Are you sure you want to change the base?
[UR][CUDA][HIP] Allow adapter objects to block full adapter teardown. #17571
Conversation
In the cuda adapter the adapter struct itself is currently an extern global defined in adapter.cpp. This means fully tearing down the adapter is subject to the same destructor ordering as all other static and global variables, it's first in last out. This presents a problem because an application can declare a static sycl object like a buffer right up top before doing anything else, which results in the sycl object being destroyed after the cuda adapter struct. The UR spec doesn't put the onus on users to keep their parent object lifetimes in order, i.e. there is no statement about "the context you use to create a ur_mem_handle_t must not be released until after the mem_handle". It's assumed (by omission rather than explicitly) that adapters will have their objects keep a reference to any parent objects alive for the duration of their own lifetime. This change moves the cuda adapter structs ownership into a global shared_ptr, which allows child objects of the adapter to keep their own references to it alive past the point where its initial definition goes out of scope. Also adjusts how some other objects track parent object references so that the destructors correctly cascade back to the top: mem handle releases its context, which releases its adapter, which releases the platform + devices, etc. Fixes intel#17450
this is a wider problem with the adapters (and spec to some extent) which I plan to address, but I think it's fine for this to go through as a stand-alone patch for now |
@@ -21,9 +21,6 @@ | |||
// UNSUPPORTED: windows && arch-intel_gpu_bmg_g21 | |||
// UNSUPPORTED-TRACKER: https://github.com/intel/llvm/issues/17255 | |||
|
|||
// UNSUPPORTED: cuda | |||
// UNSUPPORTED-TRACKER: https://github.com/intel/llvm/issues/17450 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this test is also failing on HIP, I suspect it might require the exact same patch for the HIP adapter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just getting a hip environment spun up to see if the fail is indeed this problem as the symptoms in the ticket are slightly different (hang rather than crash)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to push the equivalent hip adapter patch but sadly it doesn't fix the fail. Even with everything on the UR side still alive during the static buffer destructor we get a segfault in libamdhip64.so.6 during this call https://github.com/intel/llvm/blob/sycl/unified-runtime/source/adapters/hip/device.hpp#L116. I tried sticking a hipDeviceGet
call in just prior to that and it seems to work fine, but SetDevice still segfaults. I even tried adding a hipInit
call there to no effect.
I can only assume that hip itself has some static
or otherwise globally managed state that hipSetDevice
relies on, as you can imagine there isn't much online about static lifetimes and hip objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl changes LGTM
In the cuda adapter the adapter struct itself is currently an extern global defined in adapter.cpp. This means fully tearing down the adapter is subject to the same destructor ordering as all other static and global variables, it's first in last out. This presents a problem because an application can declare a static sycl object like a buffer right up top before doing anything else, which results in the sycl object being destroyed after the cuda adapter struct.
The UR spec doesn't put the onus on users to keep their parent object lifetimes in order, i.e. there is no statement about "the context you use to create a ur_mem_handle_t must not be released until after the mem_handle". It's assumed (by omission rather than explicitly) that adapters will have their objects keep a reference to any parent objects alive for the duration of their own lifetime.
This change moves the cuda adapter structs ownership into a global shared_ptr, which allows child objects of the adapter to keep their own references to it alive past the point where its initial definition goes out of scope. Also adjusts how some other objects track parent object references so that the destructors correctly cascade back to the top: mem handle releases its context, which releases its adapter, which releases the platform + devices, etc.
All of this also applies to the hip adapter, although it seems something in hip itself prevents this change from fixing the static_buffer_dtor test - see #17571 (comment)
Fixes #17450