Skip to content

[SYCL][Graph] Modified the adapters such that it is valid to call release on... #18619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: sycl
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions sycl/source/detail/graph_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -977,11 +977,6 @@ exec_graph_impl::~exec_graph_impl() {
const sycl::detail::AdapterPtr &Adapter =
sycl::detail::getSyclObjImpl(MContext)->getAdapter();
MSchedule.clear();
// We need to wait on all command buffer executions before we can release
// them.
for (auto &Event : MExecutionEvents) {
Event->wait(Event);
}

// Clean up any graph-owned allocations that were allocated
MGraphImpl->getMemPool().deallocateAndUnmapAll();
Expand Down
1 change: 1 addition & 0 deletions unified-runtime/scripts/core/EXP-COMMAND-BUFFER.rst
Original file line number Diff line number Diff line change
Expand Up @@ -563,3 +563,4 @@ Contributors
* Maxime France-Pillois `[email protected] <[email protected]>`_
* Aaron Greig `[email protected] <[email protected]>`_
* Fábio Mestre `[email protected] <[email protected]>`_
* Konrad Kusiak `[email protected] <[email protected]>`_
2 changes: 1 addition & 1 deletion unified-runtime/scripts/core/exp-command-buffer.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to call the generate build target to regenerate the doyxgen in the headers.

Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ returns:
- $X_RESULT_ERROR_OUT_OF_HOST_MEMORY
--- #--------------------------------------------------------------------------
type: function
desc: "Decrement the command-buffer object's reference count and delete the command-buffer object if the reference count becomes zero."
desc: "Decrement the command-buffer object's reference count and delete the command-buffer object if the reference count becomes zero. It will try synchronizing the command-buffer, hence it is legal to call it while command-buffer is still executing."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It will try synchronizing the command-buffer," makes it sound like this will happen even if the reference count isn't zero. So would flip round the two phrases, something like

It is legal to call the entry-point while hCommandBuffer is still executing, which will block on completion if the reference count of hCommandBuffer becomes zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds better, thanks 👍

class: $xCommandBuffer
name: ReleaseExp
params:
Expand Down
4 changes: 4 additions & 0 deletions unified-runtime/source/adapters/cuda/command_buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,9 @@ urCommandBufferRetainExp(ur_exp_command_buffer_handle_t hCommandBuffer) {
UR_APIEXPORT ur_result_t UR_APICALL
urCommandBufferReleaseExp(ur_exp_command_buffer_handle_t hCommandBuffer) {
if (hCommandBuffer->decrementReferenceCount() == 0) {
if (hCommandBuffer->CurrentExecution) {
UR_CHECK_ERROR(hCommandBuffer->CurrentExecution->wait());
}
// Ref count has reached zero, release of created commands
for (auto &Command : hCommandBuffer->CommandHandles) {
commandHandleDestroy(Command);
Expand Down Expand Up @@ -1175,6 +1178,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urEnqueueCommandBufferExp(
if (phEvent) {
UR_CHECK_ERROR(RetImplEvent->record());
*phEvent = RetImplEvent.release();
hCommandBuffer->CurrentExecution = *phEvent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this solution work if the UR caller doesn't pass a phEvent to urEnqueueCommandBufferExp, and then does a command-buffer release? Note, this is what your CTS test does.

}
return UR_RESULT_SUCCESS;
} catch (ur_result_t Err) {
Expand Down
2 changes: 2 additions & 0 deletions unified-runtime/source/adapters/cuda/command_buffer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,8 @@ struct ur_exp_command_buffer_handle_t_ : ur::cuda::handle_base {
// Atomic variable counting the number of reference to this command_buffer
// using std::atomic prevents data race when incrementing/decrementing.
std::atomic_uint32_t RefCount;
// The event of current graph execution.
ur_event_handle_t CurrentExecution = nullptr;

// Ordered map of sync_points to ur_events, so that we can find the last
// node added to an in-order command-buffer.
Expand Down
4 changes: 4 additions & 0 deletions unified-runtime/source/adapters/hip/command_buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,9 @@ urCommandBufferRetainExp(ur_exp_command_buffer_handle_t hCommandBuffer) {
UR_APIEXPORT ur_result_t UR_APICALL
urCommandBufferReleaseExp(ur_exp_command_buffer_handle_t hCommandBuffer) {
if (hCommandBuffer->decrementReferenceCount() == 0) {
if (hCommandBuffer->CurrentExecution) {
UR_CHECK_ERROR(hCommandBuffer->CurrentExecution->wait());
}
delete hCommandBuffer;
}
return UR_RESULT_SUCCESS;
Expand Down Expand Up @@ -811,6 +814,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urEnqueueCommandBufferExp(
if (phEvent) {
UR_CHECK_ERROR(RetImplEvent->record());
*phEvent = RetImplEvent.release();
hCommandBuffer->CurrentExecution = *phEvent;
}
} catch (ur_result_t Err) {
return Err;
Expand Down
2 changes: 2 additions & 0 deletions unified-runtime/source/adapters/hip/command_buffer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ struct ur_exp_command_buffer_handle_t_ : ur::hip::handle_base {
// Atomic variable counting the number of reference to this command_buffer
// using std::atomic prevents data race when incrementing/decrementing.
std::atomic_uint32_t RefCount;
// The event of current graph execution.
ur_event_handle_t CurrentExecution = nullptr;

// Ordered map of sync_points to ur_events
std::map<ur_exp_command_buffer_sync_point_t, hipGraphNode_t> SyncPoints;
Expand Down
39 changes: 20 additions & 19 deletions unified-runtime/source/adapters/level_zero/command_buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -671,6 +671,25 @@ ur_result_t createMainCommandList(ur_context_handle_t Context,
return UR_RESULT_SUCCESS;
}

/**
* Waits for any ongoing executions of the command-buffer to finish.
* @param CommandBuffer The command-buffer to wait for.
* @return UR_RESULT_SUCCESS or an error code on failure
*/
ur_result_t
waitForOngoingExecution(ur_exp_command_buffer_handle_t CommandBuffer) {

if (ur_event_handle_t &CurrentSubmissionEvent =
CommandBuffer->CurrentSubmissionEvent) {
ZE2UR_CALL(zeEventHostSynchronize,
(CurrentSubmissionEvent->ZeEvent, UINT64_MAX));
UR_CALL(urEventReleaseInternal(CurrentSubmissionEvent));
CurrentSubmissionEvent = nullptr;
}

return UR_RESULT_SUCCESS;
}

/**
* Checks whether the command-buffer can be constructed using in order
* command-lists.
Expand Down Expand Up @@ -830,6 +849,7 @@ urCommandBufferReleaseExp(ur_exp_command_buffer_handle_t CommandBuffer) {
if (!CommandBuffer->RefCount.decrementAndTest())
return UR_RESULT_SUCCESS;

UR_CALL(waitForOngoingExecution(CommandBuffer));
CommandBuffer->cleanupCommandBufferResources();
delete CommandBuffer;
return UR_RESULT_SUCCESS;
Expand Down Expand Up @@ -1442,25 +1462,6 @@ ur_result_t getZeCommandQueue(ur_queue_handle_t Queue, bool UseCopyEngine,
return UR_RESULT_SUCCESS;
}

/**
* Waits for any ongoing executions of the command-buffer to finish.
* @param CommandBuffer The command-buffer to wait for.
* @return UR_RESULT_SUCCESS or an error code on failure
*/
ur_result_t
waitForOngoingExecution(ur_exp_command_buffer_handle_t CommandBuffer) {

if (ur_event_handle_t &CurrentSubmissionEvent =
CommandBuffer->CurrentSubmissionEvent) {
ZE2UR_CALL(zeEventHostSynchronize,
(CurrentSubmissionEvent->ZeEvent, UINT64_MAX));
UR_CALL(urEventReleaseInternal(CurrentSubmissionEvent));
CurrentSubmissionEvent = nullptr;
}

return UR_RESULT_SUCCESS;
}

/**
* Waits for the all the dependencies of the command-buffer
* @param[in] CommandBuffer The command-buffer.
Expand Down
4 changes: 4 additions & 0 deletions unified-runtime/source/adapters/opencl/command_buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,10 @@ urCommandBufferRetainExp(ur_exp_command_buffer_handle_t hCommandBuffer) {
UR_APIEXPORT ur_result_t UR_APICALL
urCommandBufferReleaseExp(ur_exp_command_buffer_handle_t hCommandBuffer) {
if (hCommandBuffer->decrementReferenceCount() == 0) {
if (hCommandBuffer->LastSubmission) {
cl_int RetErr = clWaitForEvents(1, &(hCommandBuffer->LastSubmission));
CL_RETURN_ON_FAILURE(RetErr);
}
delete hCommandBuffer;
}

Expand Down
10 changes: 10 additions & 0 deletions unified-runtime/test/conformance/exp_command_buffer/enqueue.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -132,3 +132,13 @@ TEST_P(urEnqueueCommandBufferExpTest, SerializeOutofOrderQueue) {
ASSERT_EQ(reference, Output[i]);
}
}

// Tests releasing command-buffer while it is still executing relying
// on synchronization during urCommandBufferReleaseExp call.
TEST_P(urEnqueueCommandBufferExpTest, EnqueueAndRelease) {
ASSERT_SUCCESS(urEnqueueCommandBufferExp(out_of_order_queue, cmd_buf_handle,
0, nullptr, nullptr));

// Release the command buffer
ASSERT_SUCCESS(urCommandBufferReleaseExp(cmd_buf_handle));
}
Loading