Skip to content

Correctly synchronise all _sync allocate/deallocations#2449

Open
wence- wants to merge 2 commits into
rapidsai:mainfrom
wence-:wence/fix/2448
Open

Correctly synchronise all _sync allocate/deallocations#2449
wence- wants to merge 2 commits into
rapidsai:mainfrom
wence-:wence/fix/2448

Conversation

@wence-

@wence- wence- commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@wence- wence- requested a review from a team as a code owner June 17, 2026 15:50
@wence- wence- requested review from bdice and ttnghia June 17, 2026 15:50
@wence- wence- added bug Something isn't working non-breaking Non-breaking change labels Jun 17, 2026
Comment on lines +39 to +40
// TODO: Do we need this one?
RMM_CUDA_TRY(cudaStreamSynchronize(stream.get()));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If we throw, we're not returning a pointer to the caller, so I don't think we need any stream-sync guarantee.

But why did we delete the logger messages?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't, we defer to the async allocate on this class that emits log messages. Rather than previously where we deferred to upstream_mr_.allocate and therefore were on the hook to do the logging ourselves.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b29d31f5-dcfc-40a3-b7d7-4252671c7d9c

📥 Commits

Reviewing files that changed from the base of the PR and between d001223 and db44906.

📒 Files selected for processing (1)
  • cpp/src/mr/detail/limiting_resource_adaptor_impl.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • cpp/src/mr/detail/limiting_resource_adaptor_impl.cpp

📝 Walkthrough

Summary by CodeRabbit

  • Refactor
    • Improved CUDA synchronous allocation/deallocation handling across multiple memory resource implementations by using explicit stream synchronization with consistent error/safe-shutdown checks.
    • Updated internal synchronization logic to wrap results more reliably, without changing public APIs.
  • Bug Fixes
    • Improved the “memory limit exceeded” error message to include allocated bytes and the requested size for clearer diagnostics.

Walkthrough

All allocate_sync and deallocate_sync implementations across 13 memory resource adaptor files are corrected to use cuda::stream_ref{cudaStream_t{nullptr}} and explicitly call cudaStreamSynchronize after the underlying allocation or deallocation, replacing the previously broken pattern of forwarding directly to async allocate/deallocate via cuda_stream_view{} without synchronization.

Changes

Fix sync allocation/deallocation contract across all resource adaptors

Layer / File(s) Summary
Sync pattern in header-only implementations
cpp/include/rmm/mr/detail/failure_callback_resource_adaptor_impl.hpp, cpp/include/rmm/mr/detail/stream_ordered_memory_resource.hpp
Both inline implementations add CUDA/RMM error headers and switch from cuda_stream_view{} to cuda::stream_ref{cudaStream_t{nullptr}}, then add explicit cudaStreamSynchronize calls: RMM_CUDA_TRY for allocate_sync and RMM_ASSERT_CUDA_SUCCESS_SAFE_SHUTDOWN for deallocate_sync.
binning_memory_resource_impl dispatch fix
cpp/src/mr/detail/binning_memory_resource_impl.cpp
allocate_sync/deallocate_sync now call get_resource_ref(bytes).allocate_sync(...)/deallocate_sync(...) directly instead of routing through the async allocate/deallocate APIs with a default stream view.
sam_headroom_memory_resource_impl deallocate_sync
cpp/src/mr/detail/sam_headroom_memory_resource_impl.cpp
Adds stream ref header; deallocate_sync introduces a named stream variable so stream.get() can be passed to cudaStreamSynchronize under the safe-shutdown assert macro.
logging_resource_adaptor_impl: try/catch on allocate_sync
cpp/src/mr/detail/logging_resource_adaptor_impl.cpp
Adds CUDA/RMM headers. allocate_sync wraps allocation plus synchronization in a try/catch, performing an extra cudaStreamSynchronize before rethrowing on failure. deallocate_sync gains explicit stream synchronization.
aligned_resource_adaptor_impl
cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp
Adds CUDA stream headers; allocate_sync/deallocate_sync now use cuda::stream_ref{cudaStream_t{nullptr}} with explicit cudaStreamSynchronize calls wrapped in RMM error macros.
arena_memory_resource_impl
cpp/src/mr/detail/arena_memory_resource_impl.cpp
Adds stream ref header; allocate_sync/deallocate_sync now use cuda::stream_ref{cudaStream_t{nullptr}} with explicit cudaStreamSynchronize calls.
callback_memory_resource_impl
cpp/src/mr/detail/callback_memory_resource_impl.cpp
Adds CUDA stream and RMM error headers; allocate_sync/deallocate_sync now use cuda::stream_ref{cudaStream_t{nullptr}} with explicit cudaStreamSynchronize wrapped in RMM macros.
limiting_resource_adaptor_impl
cpp/src/mr/detail/limiting_resource_adaptor_impl.cpp
Adds CUDA stream headers; error message for memory limit exceeded now uses std::stringstream with allocation and request details; allocate_sync/deallocate_sync now use cuda::stream_ref{cudaStream_t{nullptr}} with explicit cudaStreamSynchronize.
prefetch_resource_adaptor_impl
cpp/src/mr/detail/prefetch_resource_adaptor_impl.cpp
Adds CUDA stream and RMM error headers; allocate_sync/deallocate_sync now use cuda::stream_ref{cudaStream_t{nullptr}} with explicit cudaStreamSynchronize wrapped in RMM macros.
statistics_resource_adaptor_impl
cpp/src/mr/detail/statistics_resource_adaptor_impl.cpp
Adds RMM error and CUDA stream headers; allocate_sync/deallocate_sync now use cuda::stream_ref{cudaStream_t{nullptr}} with explicit cudaStreamSynchronize wrapped in RMM error/assert macros.
thread_safe_resource_adaptor_impl
cpp/src/mr/detail/thread_safe_resource_adaptor_impl.cpp
Adds CUDA stream and RMM error headers; allocate_sync and deallocate_sync now use cuda::stream_ref{cudaStream_t{nullptr}} with explicit cudaStreamSynchronize wrapped in RMM macros.
tracking_resource_adaptor_impl
cpp/src/mr/detail/tracking_resource_adaptor_impl.cpp
Adds RMM error and CUDA stream headers; allocate_sync and deallocate_sync now use cuda::stream_ref{cudaStream_t{nullptr}} with explicit cudaStreamSynchronize wrapped in RMM error/assert macros.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • rapidsai/rmm#2361: Touches the same allocate_sync/deallocate_sync paths in *_resource_adaptor_impl files and stream_ordered_memory_resource, which is the exact code modified by this PR.
  • rapidsai/rmm#2401: Also updates deallocate_sync to explicitly synchronize the default/null CUDA stream using cuda::stream_ref{nullptr} and cudaStreamSynchronize.
  • rapidsai/rmm#2403: Modifies binning_memory_resource_impl.cpp's allocate_sync/deallocate_sync paths, the same function this PR corrects.

Suggested reviewers

  • vyasr
  • davidwendt
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Correctly synchronise all _sync allocate/deallocations' directly describes the main change: fixing synchronization in sync allocation/deallocation functions across multiple memory resources.
Description check ✅ Passed The description is related to the changeset by referencing issue #2448 and noting documentation updates, though it provides limited implementation details.
Linked Issues check ✅ Passed The PR fully addresses issue #2448 by implementing the correct synchronization pattern across all identified memory resources [#2448], ensuring allocate_sync and deallocate_sync properly synchronize CUDA streams.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing synchronization in allocate_sync/deallocate_sync methods as specified in #2448, with no extraneous modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

std::stringstream msg;
msg << "Exceeded memory limit " << allocation_limit_ << "; Allocated bytes " << allocated_bytes_
<< "; Requested bytes " << bytes << "\n";
RMM_FAIL(msg.str(), rmm::out_of_memory);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice bdice left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial feedback. I'll look at the CI error soon.

edit: Is the CI error fixed?

std::size_t alignment) noexcept
{
get_resource_ref(bytes).deallocate(cuda_stream_view{}, ptr, bytes, alignment);
return get_resource_ref(bytes).deallocate_sync(ptr, bytes, alignment);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is void, and same for the upstream deallocation. Why return?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for it, just habit.

Comment on lines +39 to +40
// TODO: Do we need this one?
RMM_CUDA_TRY(cudaStreamSynchronize(stream.get()));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If we throw, we're not returning a pointer to the caller, so I don't think we need any stream-sync guarantee.

But why did we delete the logger messages?

std::size_t alignment) noexcept
{
auto const stream = cuda_stream_view{};
logger_->info("free,%p,%zu,%s", ptr, bytes, rmm::detail::format_stream(stream));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we delete the logging?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for allocate, we're calling this->deallocate that emits logging now. Rather than upstream.deallocate that does not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[BUG] Almost all of the recently refactored memory resource synchronous allocate/deallocate functions aren't

2 participants