Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UR][CUDA][HIP] Allow adapter objects to block full adapter teardown. #17571

Open
wants to merge 11 commits into
base: sycl
Choose a base branch
from

Conversation

aarongreig
Copy link
Contributor

@aarongreig aarongreig commented Mar 21, 2025

In the cuda adapter the adapter struct itself is currently an extern global defined in adapter.cpp. This means fully tearing down the adapter is subject to the same destructor ordering as all other static and global variables, it's first in last out. This presents a problem because an application can declare a static sycl object like a buffer right up top before doing anything else, which results in the sycl object being destroyed after the cuda adapter struct.

The UR spec doesn't put the onus on users to keep their parent object lifetimes in order, i.e. there is no statement about "the context you use to create a ur_mem_handle_t must not be released until after the mem_handle". It's assumed (by omission rather than explicitly) that adapters will have their objects keep a reference to any parent objects alive for the duration of their own lifetime.

This change moves the cuda adapter structs ownership into a global shared_ptr, which allows child objects of the adapter to keep their own references to it alive past the point where its initial definition goes out of scope. Also adjusts how some other objects track parent object references so that the destructors correctly cascade back to the top: mem handle releases its context, which releases its adapter, which releases the platform + devices, etc.

All of this also applies to the hip adapter, although it seems something in hip itself prevents this change from fixing the static_buffer_dtor test - see #17571 (comment)

Fixes #17450

In the cuda adapter the adapter struct itself is currently an extern
global defined in adapter.cpp. This means fully tearing down the adapter
is subject to the same destructor ordering as all other static and
global variables, it's first in last out. This presents a problem
because an application can declare a static sycl object like a buffer
right up top before doing anything else, which results in the sycl
object being destroyed after the cuda adapter struct.

The UR spec doesn't put the onus on users to keep their parent object
lifetimes in order, i.e. there is no statement about "the context you
use to create a ur_mem_handle_t must not be released until after the
mem_handle". It's assumed (by omission rather than explicitly) that
adapters will have their objects keep a reference to any parent objects
alive for the duration of their own lifetime.

This change moves the cuda adapter structs ownership into a global
shared_ptr, which allows child objects of the adapter to keep their own
references to it alive past the point where its initial definition goes
out of scope. Also adjusts how some other objects track parent object
references so that the destructors correctly cascade back to the top:
mem handle releases its context, which releases its adapter, which
releases the platform + devices, etc.

Fixes intel#17450
@aarongreig aarongreig marked this pull request as ready for review March 21, 2025 16:58
@aarongreig aarongreig requested review from a team as code owners March 21, 2025 16:58
@aarongreig aarongreig requested review from Seanst98 and againull March 21, 2025 16:58
@aarongreig
Copy link
Contributor Author

this is a wider problem with the adapters (and spec to some extent) which I plan to address, but I think it's fine for this to go through as a stand-alone patch for now

@@ -21,9 +21,6 @@
// UNSUPPORTED: windows && arch-intel_gpu_bmg_g21
// UNSUPPORTED-TRACKER: https://github.com/intel/llvm/issues/17255

// UNSUPPORTED: cuda
// UNSUPPORTED-TRACKER: https://github.com/intel/llvm/issues/17450
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this test is also failing on HIP, I suspect it might require the exact same patch for the HIP adapter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just getting a hip environment spun up to see if the fail is indeed this problem as the symptoms in the ticket are slightly different (hang rather than crash)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to push the equivalent hip adapter patch but sadly it doesn't fix the fail. Even with everything on the UR side still alive during the static buffer destructor we get a segfault in libamdhip64.so.6 during this call https://github.com/intel/llvm/blob/sycl/unified-runtime/source/adapters/hip/device.hpp#L116. I tried sticking a hipDeviceGet call in just prior to that and it seems to work fine, but SetDevice still segfaults. I even tried adding a hipInit call there to no effect.

I can only assume that hip itself has some static or otherwise globally managed state that hipSetDevice relies on, as you can imagine there isn't much online about static lifetimes and hip objects

@aarongreig aarongreig changed the title [UR][CUDA] Allow cuda adapter objects to block full adapter teardown. [UR][CUDA][HIP] Allow adapter objects to block full adapter teardown. Mar 25, 2025
Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl changes LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SYCL][UR][CUDA] wrong order of the ~ur_device_handle_t_() destructor and a user-app static buffer destructor
3 participants