From 6567ca416307485e23cd82650b4ca1adc09e10ae Mon Sep 17 00:00:00 2001 From: Laurence Tratt Date: Sat, 24 May 2025 20:50:21 +0100 Subject: [PATCH] Record "proper" loops. This commit, in essence, makes `control_point` the very first thing in the interpreter loop. This allows yk to record more "natural" traces that, mostly, the optimiser can do a better job on. For example, it means that traces no longer start with a "it should be optimised away, surely" `load_inst` call -- because the `promote` is now also after the `control_point`, we naturally optimise this away. I experimented with this way back when, but at that point it always led to worse results. It turns out that's because on big_loop -- which is all I had as a benchmark then! -- the newly produced JIT IR, though smaller, happens to cause the register allocator to do a silly spill. I can, and will, fix that -- but I don't think it's worth holding up this commit anymore, because it's clear that, overall, this is a win. From a `haste` run I get this: ``` Permute/YkLua/1000 8.48% faster Queens/YkLua/1000 7.71% faster Towers/YkLua/600 4.90% faster List/YkLua/1500 4.80% faster LuLPeg/YkLua/default 3.99% faster DeltaBlue/YkLua/12000 3.78% faster NBody/YkLua/250000 3.26% faster Heightmap/YkLua/2000 2.84% faster Richards/YkLua/100 2.53% faster Bounce/YkLua/1500 2.02% faster Sieve/YkLua/3000 1.20% faster Storage/YkLua/1000 0.20% slower Mandelbrot/YkLua/500 0.25% slower Havlak/YkLua/1500 0.57% slower Json/YkLua/100 0.92% slower HashIds/YkLua/6000 1.48% slower CD/YkLua/250 3.42% slower BigLoop/YkLua/1000000000 14.88% slower ``` HashIds, CD, and (very obviously!) BigLoop get worse; many benchmarks aren't really effected much one way or the other (e.g. Storage and Mandelbrot are completely within the noise; Havlak probably is too; Json I'm somewhat unsure about). A number of benchmarks (Permute, Queens, Towers, List, DeltaBlue, NBody, Heightmap, Richards, and Bounce) benefit (LuLPeg is too noisy for me to read much into it) get better, in some cases quite significantly. Overall, I think the wins outweigh the benfits, particularly as I know what ails BigLoop (in essence: we spill a register *before* a guard instead of *inside* a guard). Fixing that may well partly or wholly fix the other benchmarks that slow down. But I think that will be interesting to do as a separate commit in yk. --- src/lvm.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/lvm.c b/src/lvm.c index e5dd3b0..4768b84 100644 --- a/src/lvm.c +++ b/src/lvm.c @@ -1234,13 +1234,15 @@ void luaV_execute (lua_State *L, CallInfo *ci) { /* main loop of interpreter */ for (;;) { Instruction i; /* instruction being executed */ -#ifdef YKLUA_DEBUG_STRS - yk_debug_str(cl->p->instdebugstrs[cast_int(pc - cl->p->code)]); -#endif - vmfetch(); #ifdef USE_YK - YkLocation *ykloc = &cl->p->yklocs[pcRel(pc, cl->p)]; + YkLocation *ykloc = &cl->p->yklocs[cast_int(pc - cl->p->code)]; yk_mt_control_point(G(L)->yk_mt, ykloc); + vmfetch(); +# ifdef YKLUA_DEBUG_STRS + yk_debug_str(cl->p->instdebugstrs[pcRel(pc, cl->p)]); +# endif +#else + vmfetch(); #endif #if 0 /* low-level line tracing for debugging Lua */