-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Backend] Emit bar.warp.sync
for barriers of 1 warp
#7336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In warp specialized regions with only 1 warp, we can emit `bar.warp.sync` instead of barriers with a threadcount. This is slightly more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Clearly something is wrong with this PR. It's not that important to dig into at the moment. |
I have seen the same issue on other PRs. I think the worker is flaky :/ |
GB200 has a lot of instability, sorry about that |
Oh, well I guess this PR got really unlucky? It didn't happen on other PRs I opened. |
Let's try again... |
I wonder why it's more efficient using |
Or it's just a measurement noise? |
|
In warp specialized regions with only 1 warp, we can emit
bar.warp.sync
instead of barriers with a threadcount. This is slightly more efficient.