Skip to content

Add RAJA once#2009

Open
artv3 wants to merge 2 commits intodevelopfrom
artv3/raja-once
Open

Add RAJA once#2009
artv3 wants to merge 2 commits intodevelopfrom
artv3/raja-once

Conversation

@artv3
Copy link
Copy Markdown
Member

@artv3 artv3 commented Apr 2, 2026

Summary

This PR add the RAJA once function, the raja once function simplifies having to mask out threads for operations by returning a RAJA::RangeSegment(0,1). This comes up when we only want 1 thread doing a certain operation in GPU kernels.

@artv3 artv3 requested a review from a team April 2, 2026 13:28
RAJA::loop<threads_x>(ctx, RAJA::RangeSegment(0, 1), [&](int c) {
// __once_loop_start
// Use a single logical thread per team for shared initialization.
RAJA::loop<threads_x>(ctx, RAJA::once(), [&](int c) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to have a once policy?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and have loop not take an iteration space?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, is that consistent with the use case?

Copy link
Copy Markdown
Member

@MrBurmark MrBurmark Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or have a non-loop function like loop?

RAJA::non_loop(ctx, [&]() {...});

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that you only want one thread doing this so you really do need the proper policy. Do you need index c?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The less to maintain the better, with the range PR you could just have the range be range(1) which is pretty short.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats holding me back is that range(1) is kind of a trick, something more explicit would be nice. @tomstitt what do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range(1) on a worksharing loop would give you the equivalent of single, one thread runs it but all other threads wait until it's done. If that's what this is then that would be a reasonable way to implement it or have people do it, but if you want it not to wait, then I don't think that would get you there, at least not without an appropriate policy or something to request a non-waiting behavior. It's also only sort-of a loop right? Doing masked might be reasonable, have it default to a mask of 0 but allow for an argument that would let you either say how many or which threads should execute it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having a wrapper around range(1), we do want something that we can range(1) for any (combination) of threads in x, y, z, not sure if that helps or hurts ideas. The main use cases I can think of are "once across x,y,z" and "once in z" where we nest some x & y work under that, including a "once across x,y"

Copy link
Copy Markdown
Member Author

@artv3 artv3 Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the wrapper too, in reading the code it makes it easier to identify what is going on.

@trws
Copy link
Copy Markdown
Member

trws commented Apr 2, 2026

As a stylistic note, we do this in OpenMP with single if the other threads should wait, or with masked or the deprecated master if they shouldn't, while once is used by C++, C, and posix to mean "run exactly one time, no matter how many threads encounter this, and block all encountering threads until that one run is done".

It might be worth using an alternate name to disambiguate it from that "once" behavior, possibly also to indicate the blocking or non-blocking behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants