Correctly synchronise all _sync allocate/deallocations by wence- · Pull Request #2449 · rapidsai/rmm

wence- · 2026-06-17T15:50:49Z

Description

Closes [BUG] Almost all of the recently refactored memory resource synchronous allocate/deallocate functions aren't #2448

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

wence- · 2026-06-17T15:51:17Z

+    // TODO: Do we need this one?
+    RMM_CUDA_TRY(cudaStreamSynchronize(stream.get()));


I don't think so. If we throw, we're not returning a pointer to the caller, so I don't think we need any stream-sync guarantee.

But why did we delete the logger messages?

We didn't, we defer to the async allocate on this class that emits log messages. Rather than previously where we deferred to upstream_mr_.allocate and therefore were on the hook to do the logging ourselves.

coderabbitai · 2026-06-17T15:56:53Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b29d31f5-dcfc-40a3-b7d7-4252671c7d9c

📥 Commits

Reviewing files that changed from the base of the PR and between d001223 and db44906.

📒 Files selected for processing (1)

cpp/src/mr/detail/limiting_resource_adaptor_impl.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

cpp/src/mr/detail/limiting_resource_adaptor_impl.cpp

📝 Walkthrough

Summary by CodeRabbit

Refactor
- Improved CUDA synchronous allocation/deallocation handling across multiple memory resource implementations by using explicit stream synchronization with consistent error/safe-shutdown checks.
- Updated internal synchronization logic to wrap results more reliably, without changing public APIs.
Bug Fixes
- Improved the “memory limit exceeded” error message to include allocated bytes and the requested size for clearer diagnostics.

Walkthrough

All allocate_sync and deallocate_sync implementations across 13 memory resource adaptor files are corrected to use cuda::stream_ref{cudaStream_t{nullptr}} and explicitly call cudaStreamSynchronize after the underlying allocation or deallocation, replacing the previously broken pattern of forwarding directly to async allocate/deallocate via cuda_stream_view{} without synchronization.

Changes

Fix sync allocation/deallocation contract across all resource adaptors

Layer / File(s)	Summary
Sync pattern in header-only implementations `cpp/include/rmm/mr/detail/failure_callback_resource_adaptor_impl.hpp`, `cpp/include/rmm/mr/detail/stream_ordered_memory_resource.hpp`	Both inline implementations add CUDA/RMM error headers and switch from `cuda_stream_view{}` to `cuda::stream_ref{cudaStream_t{nullptr}}`, then add explicit `cudaStreamSynchronize` calls: `RMM_CUDA_TRY` for `allocate_sync` and `RMM_ASSERT_CUDA_SUCCESS_SAFE_SHUTDOWN` for `deallocate_sync`.
binning_memory_resource_impl dispatch fix `cpp/src/mr/detail/binning_memory_resource_impl.cpp`	`allocate_sync`/`deallocate_sync` now call `get_resource_ref(bytes).allocate_sync(...)`/`deallocate_sync(...)` directly instead of routing through the async `allocate`/`deallocate` APIs with a default stream view.
sam_headroom_memory_resource_impl deallocate_sync `cpp/src/mr/detail/sam_headroom_memory_resource_impl.cpp`	Adds stream ref header; `deallocate_sync` introduces a named `stream` variable so `stream.get()` can be passed to `cudaStreamSynchronize` under the safe-shutdown assert macro.
logging_resource_adaptor_impl: try/catch on allocate_sync `cpp/src/mr/detail/logging_resource_adaptor_impl.cpp`	Adds CUDA/RMM headers. `allocate_sync` wraps allocation plus synchronization in a try/catch, performing an extra `cudaStreamSynchronize` before rethrowing on failure. `deallocate_sync` gains explicit stream synchronization.
aligned_resource_adaptor_impl `cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp`	Adds CUDA stream headers; `allocate_sync`/`deallocate_sync` now use `cuda::stream_ref{cudaStream_t{nullptr}}` with explicit `cudaStreamSynchronize` calls wrapped in RMM error macros.
arena_memory_resource_impl `cpp/src/mr/detail/arena_memory_resource_impl.cpp`	Adds stream ref header; `allocate_sync`/`deallocate_sync` now use `cuda::stream_ref{cudaStream_t{nullptr}}` with explicit `cudaStreamSynchronize` calls.
callback_memory_resource_impl `cpp/src/mr/detail/callback_memory_resource_impl.cpp`	Adds CUDA stream and RMM error headers; `allocate_sync`/`deallocate_sync` now use `cuda::stream_ref{cudaStream_t{nullptr}}` with explicit `cudaStreamSynchronize` wrapped in RMM macros.
limiting_resource_adaptor_impl `cpp/src/mr/detail/limiting_resource_adaptor_impl.cpp`	Adds CUDA stream headers; error message for memory limit exceeded now uses `std::stringstream` with allocation and request details; `allocate_sync`/`deallocate_sync` now use `cuda::stream_ref{cudaStream_t{nullptr}}` with explicit `cudaStreamSynchronize`.
prefetch_resource_adaptor_impl `cpp/src/mr/detail/prefetch_resource_adaptor_impl.cpp`	Adds CUDA stream and RMM error headers; `allocate_sync`/`deallocate_sync` now use `cuda::stream_ref{cudaStream_t{nullptr}}` with explicit `cudaStreamSynchronize` wrapped in RMM macros.
statistics_resource_adaptor_impl `cpp/src/mr/detail/statistics_resource_adaptor_impl.cpp`	Adds RMM error and CUDA stream headers; `allocate_sync`/`deallocate_sync` now use `cuda::stream_ref{cudaStream_t{nullptr}}` with explicit `cudaStreamSynchronize` wrapped in RMM error/assert macros.
thread_safe_resource_adaptor_impl `cpp/src/mr/detail/thread_safe_resource_adaptor_impl.cpp`	Adds CUDA stream and RMM error headers; `allocate_sync` and `deallocate_sync` now use `cuda::stream_ref{cudaStream_t{nullptr}}` with explicit `cudaStreamSynchronize` wrapped in RMM macros.
tracking_resource_adaptor_impl `cpp/src/mr/detail/tracking_resource_adaptor_impl.cpp`	Adds RMM error and CUDA stream headers; `allocate_sync` and `deallocate_sync` now use `cuda::stream_ref{cudaStream_t{nullptr}}` with explicit `cudaStreamSynchronize` wrapped in RMM error/assert macros.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

rapidsai/rmm#2361: Touches the same allocate_sync/deallocate_sync paths in *_resource_adaptor_impl files and stream_ordered_memory_resource, which is the exact code modified by this PR.
rapidsai/rmm#2401: Also updates deallocate_sync to explicitly synchronize the default/null CUDA stream using cuda::stream_ref{nullptr} and cudaStreamSynchronize.
rapidsai/rmm#2403: Modifies binning_memory_resource_impl.cpp's allocate_sync/deallocate_sync paths, the same function this PR corrects.

Suggested reviewers

vyasr
davidwendt

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Correctly synchronise all _sync allocate/deallocations' directly describes the main change: fixing synchronization in sync allocation/deallocation functions across multiple memory resources.
Description check	✅ Passed	The description is related to the changeset by referencing issue `#2448` and noting documentation updates, though it provides limited implementation details.
Linked Issues check	✅ Passed	The PR fully addresses issue `#2448` by implementing the correct synchronization pattern across all identified memory resources [`#2448`], ensuring allocate_sync and deallocate_sync properly synchronize CUDA streams.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing synchronization in allocate_sync/deallocate_sync methods as specified in `#2448`, with no extraneous modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

wence- · 2026-06-22T08:40:03Z

+  std::stringstream msg;
+  msg << "Exceeded memory limit " << allocation_limit_ << "; Allocated bytes " << allocated_bytes_
+      << "; Requested bytes " << bytes << "\n";
+  RMM_FAIL(msg.str(), rmm::out_of_memory);


CI had this error https://github.com/rapidsai/rmm/actions/runs/27701750774/job/81941320252?pr=2449#step:12:704

that I am trying to understand.

bdice

Initial feedback. I'll look at the CI error soon.

edit: Is the CI error fixed?

bdice · 2026-06-26T16:12:48Z

                                                   std::size_t alignment) noexcept
 {
-  get_resource_ref(bytes).deallocate(cuda_stream_view{}, ptr, bytes, alignment);
+  return get_resource_ref(bytes).deallocate_sync(ptr, bytes, alignment);


This is void, and same for the upstream deallocation. Why return?

No need for it, just habit.

bdice · 2026-06-26T16:17:11Z

+    // TODO: Do we need this one?
+    RMM_CUDA_TRY(cudaStreamSynchronize(stream.get()));


I don't think so. If we throw, we're not returning a pointer to the caller, so I don't think we need any stream-sync guarantee.

But why did we delete the logger messages?

bdice · 2026-06-26T16:18:16Z

                                                    std::size_t alignment) noexcept
 {
-  auto const stream = cuda_stream_view{};
-  logger_->info("free,%p,%zu,%s", ptr, bytes, rmm::detail::format_stream(stream));


Why did we delete the logging?

As for allocate, we're calling this->deallocate that emits logging now. Rather than upstream.deallocate that does not.

Correctly synchronise all _sync allocate/deallocations

d001223

- Closes rapidsai#2448

wence- requested a review from a team as a code owner June 17, 2026 15:50

wence- requested review from bdice and ttnghia June 17, 2026 15:50

github-project-automation Bot added this to RMM Project Board Jun 17, 2026

wence- added bug Something isn't working non-breaking Non-breaking change labels Jun 17, 2026

wence- commented Jun 17, 2026

View reviewed changes

wence- mentioned this pull request Jun 17, 2026

Delegate logging_resource_adaptor_impl sync methods to their stream-ordered counterparts #2445

Closed

3 tasks

Better limiting resource allocation breach error

db44906

wence- commented Jun 22, 2026

View reviewed changes

bdice reviewed Jun 26, 2026

View reviewed changes

		// TODO: Do we need this one?
		RMM_CUDA_TRY(cudaStreamSynchronize(stream.get()));

Uh oh!

Conversation

wence- commented Jun 17, 2026

Description

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdice left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

bdice left a comment •

edited

Loading