Record "proper" loops. #128
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[Needs https://github.com/ykjit/yk/pull/1772 to be merged first.]
This commit, in essence, makes
control_pointthe very first thing in the interpreter loop. This allows yk to record more "natural" traces that, mostly, the optimiser can do a better job on. For example, it means that traces no longer start with a "it should be optimised away, surely"load_instcall -- because thepromoteis now also after thecontrol_point, we naturally optimise this away.I experimented with this way back when, but at that point it always led to worse results. It turns out that's because on big_loop -- which is all I had as a benchmark then! -- the newly produced JIT IR, though smaller, happens to cause the register allocator to do a silly spill. I can, and will, fix that -- but I don't think it's worth holding up this commit anymore, because it's clear that, overall, this is a win.
From a
hasterun I get this:HashIds, CD, and (very obviously!) BigLoop get worse; many benchmarks aren't really effected much one way or the other (e.g. Storage and Mandelbrot are completely within the noise; Havlak probably is too; Json I'm somewhat unsure about).
A number of benchmarks (Permute, Queens, Towers, List, DeltaBlue, NBody, Heightmap, Richards, and Bounce) benefit (LuLPeg is too noisy for me to read much into it) get better, in some cases quite significantly.
Overall, I think the wins outweigh the benfits, particularly as I know what ails BigLoop (in essence: we spill a register before a guard instead of inside a guard). Fixing that may well partly or wholly fix the other benchmarks that slow down. But I think that will be interesting to do as a separate commit in yk.