Record "proper" loops. #128

ltratt · 2025-05-24T22:17:24Z

[Needs https://github.com/ykjit/yk/pull/1772 to be merged first.]

This commit, in essence, makes control_point the very first thing in the interpreter loop. This allows yk to record more "natural" traces that, mostly, the optimiser can do a better job on. For example, it means that traces no longer start with a "it should be optimised away, surely" load_inst call -- because the promote is now also after the control_point, we naturally optimise this away.

I experimented with this way back when, but at that point it always led to worse results. It turns out that's because on big_loop -- which is all I had as a benchmark then! -- the newly produced JIT IR, though smaller, happens to cause the register allocator to do a silly spill. I can, and will, fix that -- but I don't think it's worth holding up this commit anymore, because it's clear that, overall, this is a win.

From a haste run I get this:

Permute/YkLua/1000        8.48% faster
Queens/YkLua/1000         7.71% faster
Towers/YkLua/600          4.90% faster
List/YkLua/1500           4.80% faster
LuLPeg/YkLua/default      3.99% faster
DeltaBlue/YkLua/12000     3.78% faster
NBody/YkLua/250000        3.26% faster
Heightmap/YkLua/2000      2.84% faster
Richards/YkLua/100        2.53% faster
Bounce/YkLua/1500         2.02% faster
Sieve/YkLua/3000          1.20% faster
Storage/YkLua/1000        0.20% slower
Mandelbrot/YkLua/500      0.25% slower
Havlak/YkLua/1500         0.57% slower
Json/YkLua/100            0.92% slower
HashIds/YkLua/6000        1.48% slower
CD/YkLua/250              3.42% slower
BigLoop/YkLua/1000000000  14.88% slower

HashIds, CD, and (very obviously!) BigLoop get worse; many benchmarks aren't really effected much one way or the other (e.g. Storage and Mandelbrot are completely within the noise; Havlak probably is too; Json I'm somewhat unsure about).

A number of benchmarks (Permute, Queens, Towers, List, DeltaBlue, NBody, Heightmap, Richards, and Bounce) benefit (LuLPeg is too noisy for me to read much into it) get better, in some cases quite significantly.

Overall, I think the wins outweigh the benfits, particularly as I know what ails BigLoop (in essence: we spill a register before a guard instead of inside a guard). Fixing that may well partly or wholly fix the other benchmarks that slow down. But I think that will be interesting to do as a separate commit in yk.

vext01 · 2025-05-27T08:36:52Z

Once the dependent, and this, are merged, do we need to sync yklua in yk?

ltratt · 2025-05-27T09:11:20Z

We don't have to... but it would probably be a good idea.

ltratt · 2025-05-27T09:12:15Z

I think this one can now be merged.

This commit, in essence, makes `control_point` the very first thing in the interpreter loop. This allows yk to record more "natural" traces that, mostly, the optimiser can do a better job on. For example, it means that traces no longer start with a "it should be optimised away, surely" `load_inst` call -- because the `promote` is now also after the `control_point`, we naturally optimise this away. I experimented with this way back when, but at that point it always led to worse results. It turns out that's because on big_loop -- which is all I had as a benchmark then! -- the newly produced JIT IR, though smaller, happens to cause the register allocator to do a silly spill. I can, and will, fix that -- but I don't think it's worth holding up this commit anymore, because it's clear that, overall, this is a win. From a `haste` run I get this: ``` Permute/YkLua/1000 8.48% faster Queens/YkLua/1000 7.71% faster Towers/YkLua/600 4.90% faster List/YkLua/1500 4.80% faster LuLPeg/YkLua/default 3.99% faster DeltaBlue/YkLua/12000 3.78% faster NBody/YkLua/250000 3.26% faster Heightmap/YkLua/2000 2.84% faster Richards/YkLua/100 2.53% faster Bounce/YkLua/1500 2.02% faster Sieve/YkLua/3000 1.20% faster Storage/YkLua/1000 0.20% slower Mandelbrot/YkLua/500 0.25% slower Havlak/YkLua/1500 0.57% slower Json/YkLua/100 0.92% slower HashIds/YkLua/6000 1.48% slower CD/YkLua/250 3.42% slower BigLoop/YkLua/1000000000 14.88% slower ``` HashIds, CD, and (very obviously!) BigLoop get worse; many benchmarks aren't really effected much one way or the other (e.g. Storage and Mandelbrot are completely within the noise; Havlak probably is too; Json I'm somewhat unsure about). A number of benchmarks (Permute, Queens, Towers, List, DeltaBlue, NBody, Heightmap, Richards, and Bounce) benefit (LuLPeg is too noisy for me to read much into it) get better, in some cases quite significantly. Overall, I think the wins outweigh the benfits, particularly as I know what ails BigLoop (in essence: we spill a register *before* a guard instead of *inside* a guard). Fixing that may well partly or wholly fix the other benchmarks that slow down. But I think that will be interesting to do as a separate commit in yk.

ltratt · 2025-05-27T09:26:26Z

Hopefully the force push unbreaks the "nony-ykllvm" case.

ltratt assigned ptersilie and vext01 May 24, 2025

vext01 added this pull request to the merge queue May 27, 2025

vext01 removed this pull request from the merge queue due to a manual request May 27, 2025

vext01 added this pull request to the merge queue May 27, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 27, 2025

ltratt force-pushed the proper_loop branch from 04b7ff1 to 6567ca4 Compare May 27, 2025 09:26

vext01 added this pull request to the merge queue May 27, 2025

Merged via the queue into ykjit:main with commit 07d8857 May 27, 2025
2 checks passed

ltratt deleted the proper_loop branch May 30, 2025 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Record "proper" loops. #128

Record "proper" loops. #128

Uh oh!

ltratt commented May 24, 2025

Uh oh!

vext01 commented May 27, 2025

Uh oh!

ltratt commented May 27, 2025

Uh oh!

ltratt commented May 27, 2025

Uh oh!

Uh oh!

ltratt commented May 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Record "proper" loops. #128

Record "proper" loops. #128

Uh oh!

Conversation

ltratt commented May 24, 2025

Uh oh!

vext01 commented May 27, 2025

Uh oh!

ltratt commented May 27, 2025

Uh oh!

ltratt commented May 27, 2025

Uh oh!

Uh oh!

ltratt commented May 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants