Skip to content

Conversation

@ltratt
Copy link
Contributor

@ltratt ltratt commented May 24, 2025

[Needs https://github.com/ykjit/yk/pull/1772 to be merged first.]

This commit, in essence, makes control_point the very first thing in the interpreter loop. This allows yk to record more "natural" traces that, mostly, the optimiser can do a better job on. For example, it means that traces no longer start with a "it should be optimised away, surely" load_inst call -- because the promote is now also after the control_point, we naturally optimise this away.

I experimented with this way back when, but at that point it always led to worse results. It turns out that's because on big_loop -- which is all I had as a benchmark then! -- the newly produced JIT IR, though smaller, happens to cause the register allocator to do a silly spill. I can, and will, fix that -- but I don't think it's worth holding up this commit anymore, because it's clear that, overall, this is a win.

From a haste run I get this:

Permute/YkLua/1000        8.48% faster
Queens/YkLua/1000         7.71% faster
Towers/YkLua/600          4.90% faster
List/YkLua/1500           4.80% faster
LuLPeg/YkLua/default      3.99% faster
DeltaBlue/YkLua/12000     3.78% faster
NBody/YkLua/250000        3.26% faster
Heightmap/YkLua/2000      2.84% faster
Richards/YkLua/100        2.53% faster
Bounce/YkLua/1500         2.02% faster
Sieve/YkLua/3000          1.20% faster
Storage/YkLua/1000        0.20% slower
Mandelbrot/YkLua/500      0.25% slower
Havlak/YkLua/1500         0.57% slower
Json/YkLua/100            0.92% slower
HashIds/YkLua/6000        1.48% slower
CD/YkLua/250              3.42% slower
BigLoop/YkLua/1000000000  14.88% slower

HashIds, CD, and (very obviously!) BigLoop get worse; many benchmarks aren't really effected much one way or the other (e.g. Storage and Mandelbrot are completely within the noise; Havlak probably is too; Json I'm somewhat unsure about).

A number of benchmarks (Permute, Queens, Towers, List, DeltaBlue, NBody, Heightmap, Richards, and Bounce) benefit (LuLPeg is too noisy for me to read much into it) get better, in some cases quite significantly.

Overall, I think the wins outweigh the benfits, particularly as I know what ails BigLoop (in essence: we spill a register before a guard instead of inside a guard). Fixing that may well partly or wholly fix the other benchmarks that slow down. But I think that will be interesting to do as a separate commit in yk.

@vext01 vext01 added this pull request to the merge queue May 27, 2025
@vext01 vext01 removed this pull request from the merge queue due to a manual request May 27, 2025
@vext01
Copy link
Contributor

vext01 commented May 27, 2025

Once the dependent, and this, are merged, do we need to sync yklua in yk?

@ltratt
Copy link
Contributor Author

ltratt commented May 27, 2025

We don't have to... but it would probably be a good idea.

@ltratt
Copy link
Contributor Author

ltratt commented May 27, 2025

I think this one can now be merged.

@vext01 vext01 added this pull request to the merge queue May 27, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 27, 2025
This commit, in essence, makes `control_point` the very first thing in
the interpreter loop. This allows yk to record more "natural" traces
that, mostly, the optimiser can do a better job on. For example, it
means that traces no longer start with a "it should be optimised away,
surely" `load_inst` call -- because the `promote` is now also after the
`control_point`, we naturally optimise this away.

I experimented with this way back when, but at that point it always led
to worse results. It turns out that's because on big_loop -- which is
all I had as a benchmark then! -- the newly produced JIT IR, though
smaller, happens to cause the register allocator to do a silly spill. I
can, and will, fix that -- but I don't think it's worth holding up this
commit anymore, because it's clear that, overall, this is a win.

From a `haste` run I get this:

```
Permute/YkLua/1000        8.48% faster
Queens/YkLua/1000         7.71% faster
Towers/YkLua/600          4.90% faster
List/YkLua/1500           4.80% faster
LuLPeg/YkLua/default      3.99% faster
DeltaBlue/YkLua/12000     3.78% faster
NBody/YkLua/250000        3.26% faster
Heightmap/YkLua/2000      2.84% faster
Richards/YkLua/100        2.53% faster
Bounce/YkLua/1500         2.02% faster
Sieve/YkLua/3000          1.20% faster
Storage/YkLua/1000        0.20% slower
Mandelbrot/YkLua/500      0.25% slower
Havlak/YkLua/1500         0.57% slower
Json/YkLua/100            0.92% slower
HashIds/YkLua/6000        1.48% slower
CD/YkLua/250              3.42% slower
BigLoop/YkLua/1000000000  14.88% slower
```

HashIds, CD, and (very obviously!) BigLoop get worse; many benchmarks
aren't really effected much one way or the other (e.g. Storage and
Mandelbrot are completely within the noise; Havlak probably is too; Json
I'm somewhat unsure about).

A number of benchmarks (Permute, Queens, Towers, List, DeltaBlue, NBody,
Heightmap, Richards, and Bounce) benefit (LuLPeg is too noisy for me to
read much into it) get better, in some cases quite significantly.

Overall, I think the wins outweigh the benfits, particularly as I know
what ails BigLoop (in essence: we spill a register *before* a guard
instead of *inside* a guard). Fixing that may well partly or wholly fix
the other benchmarks that slow down. But I think that will be
interesting to do as a separate commit in yk.
@ltratt
Copy link
Contributor Author

ltratt commented May 27, 2025

Hopefully the force push unbreaks the "nony-ykllvm" case.

@vext01 vext01 added this pull request to the merge queue May 27, 2025
Merged via the queue into ykjit:main with commit 07d8857 May 27, 2025
2 checks passed
@ltratt ltratt deleted the proper_loop branch May 30, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants