-
Notifications
You must be signed in to change notification settings - Fork 793
[Branch Hints] Add a utility to compare with metadata, and use it in merging opt passes #7733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
;; NOTE: Assertions have been generated by update_lit_checks.py --all-items and should not be edited. | ||
;; NOTE: This test was ported using port_passes_tests_to_lit.py and could be cleaned up. | ||
|
||
;; RUN: wasm-opt %s -all --code-folding -S -o - | filecheck %s | ||
|
||
(module | ||
;; CHECK: (type $0 (func (param i32 i32) (result f32))) | ||
|
||
;; CHECK: (func $different (type $0) (param $x i32) (param $y i32) (result f32) | ||
;; CHECK-NEXT: (if | ||
;; CHECK-NEXT: (local.get $x) | ||
;; CHECK-NEXT: (then | ||
;; CHECK-NEXT: (@metadata.code.branch_hint "\00") | ||
;; CHECK-NEXT: (if | ||
;; CHECK-NEXT: (local.get $y) | ||
;; CHECK-NEXT: (then | ||
;; CHECK-NEXT: (nop) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: (else | ||
;; CHECK-NEXT: (@metadata.code.branch_hint "\01") | ||
;; CHECK-NEXT: (if | ||
;; CHECK-NEXT: (local.get $y) | ||
;; CHECK-NEXT: (then | ||
;; CHECK-NEXT: (nop) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: (f32.const 0) | ||
;; CHECK-NEXT: ) | ||
(func $different (param $x i32) (param $y i32) (result f32) | ||
;; The branch hints differ, so we do not optimize. | ||
(if (result f32) | ||
(local.get $x) | ||
(then | ||
(block (result f32) | ||
(@metadata.code.branch_hint "\00") | ||
(if | ||
(local.get $y) | ||
(then | ||
(nop) | ||
) | ||
) | ||
(f32.const 0) | ||
) | ||
) | ||
(else | ||
(block (result f32) | ||
(@metadata.code.branch_hint "\01") | ||
(if | ||
(local.get $y) | ||
(then | ||
(nop) | ||
) | ||
) | ||
(f32.const 0) | ||
) | ||
) | ||
) | ||
) | ||
|
||
;; CHECK: (func $same (type $0) (param $x i32) (param $y i32) (result f32) | ||
;; CHECK-NEXT: (drop | ||
;; CHECK-NEXT: (local.get $x) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: (@metadata.code.branch_hint "\00") | ||
;; CHECK-NEXT: (if | ||
;; CHECK-NEXT: (local.get $y) | ||
;; CHECK-NEXT: (then | ||
;; CHECK-NEXT: (nop) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: (f32.const 0) | ||
;; CHECK-NEXT: ) | ||
(func $same (param $x i32) (param $y i32) (result f32) | ||
;; The branch hints are the same, so we optimize. | ||
(if (result f32) | ||
(local.get $x) | ||
(then | ||
(block (result f32) | ||
(@metadata.code.branch_hint "\00") | ||
(if | ||
(local.get $y) | ||
(then | ||
(nop) | ||
) | ||
) | ||
(f32.const 0) | ||
) | ||
) | ||
(else | ||
(block (result f32) | ||
(@metadata.code.branch_hint "\00") | ||
(if | ||
(local.get $y) | ||
(then | ||
(nop) | ||
) | ||
) | ||
(f32.const 0) | ||
) | ||
) | ||
) | ||
) | ||
) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to merge and drop the hint in this case. If the only benefit of branch hints is that cold code can be placed far away, then surely just deduplicating the cold code into some warmer code is at least as good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the issue may be cold code inside each of the arms. Imagine that under some condition, matching
$x
, the first arm runs 30% faster (because the internal if is almost never entered, depending on$y
), and that in the reverse condition, the second arm runs 30% faster (because the internal if is almost always entered). Merging the arms and removing the branch hint would make us 30% slower.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But wouldn't the hypothetical speedups of 30% be achieved via better code cache locality when the cold blocks are split out? Then merging the code would achieve the same speedups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the simple testcase here is misleading - imagine there are loops in each of the arms here, so we spend a lot of time in one arm, and the hints matter.
The problem is that which blocks are cold depends on what data we are running on:
In more detail, imagine we have a function
calc_mostly_0
which processes some huge buffer of data, and assumes most of the code is 0. We would want a hint there that whenever it checks if an item is 0, that is likely.And imagine we also have
calc_mostly_1
which processes similar data, but assumes most of the data is 1. The hint there would say items of 0 are unlikely, the reverse of before.Now assume those two functions are identical in all but the branch hints. Then they get compiled to very different machine code: in the first, blocks that handle 1 are cold and moved out, while in the latter, blocks that handle 0 are cold and moved out.
And, imagine we can tell what the data properties are, at runtime: perhaps we know that data from certain sources is mostly 0s, and from others, mostly 1s, so we might do
Merging code here, or even just removing the branch hints, would be slower.
This is a bit unlikely to happen in reality - likely there would be other differences somewhere - but it is realistic, I think, to collect separate PGO profiles and compile multiple versions of a function, if you have a way to know which profile is more relevant. Like imagine a physics engine can be tuned for many small objects or a few big and complicated ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if you merge the functions in that situation, then the merged function has more co-located code than either of the unmerged functions would have. But there would be less code overall, so potentially less code cache pressure and better performance, although that would depend on the placement of the functions and the behavior of the cache. I just don't think there's as clear a benefit to keeping the functions separate as you are arguing, and allowing optimization hints to inhibit optimizations seems like a priority inversion to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is more code overall without merging, but the hot-loop parts would remain hot loops regardless of how much code is around them. We have to assume some effect of locality like that, I think? Otherwise, if I follow your logic, we'd need to consider erasing hints after inlining (as the extra code might render them ineffective), but nothing I have read suggests that.
Also, not merging code here is necessary to preserve the fuzzing invariant that optimizations do not cause bad branch hints: If we merge, we must pick one hint and it might be wrong (causing large slowdowns). Or, if we merge and erase the hints, we may also cause a slowdown.
(I do think maybe when optimizing for size, that we may want to always merge, though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking offline, it sounds like we just had different expectations about how useful the hints would be and therefore how hard we should try to preserve them. I hadn't been considering register allocation effects in particular. This approach sgtm, especially if it turns out that LLVM makes similar decisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you raised a very good point with following LLVM. I had trouble finding a clear answer in the source, but here is a testcase with a select:
and here is one with an if:
In both cases LLVM is perfectly happy to pick one of the two arbitrarily. It makes no effort to preserve the hint.
I am surprised by this but I think we can trust LLVM on it, so let's not land this, and I'll also revert #7715 . I will also need to figure out some other mechanism for how to fuzz this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to drop the hint on merging? If so, I wouldn't expect that to need special fuzzer support. Or is the plan to pick one arbitrarily and keep its hint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering a few possible plans for the fuzzer, either skipping relevant passes, or a flag.
Yes, I think picking one hint arbitrarily, as LLVM does, is good enough - it's simplest and presumably LLVM found it ok to do (maybe even optimal).