-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Now that we are free of hwtracer and its oddities, we can rethink where we insert __yk_trace_basicblock calls. I believe we can remove a substantial amount of them.
Outline of the idea
Here's the basic intuition: in many cases, we know for sure that block A is followed by block B and that block B is only ever reached by direct branches. In such cases, we do not need to have a __yk_trace_basicblock in B.
Here's example AOT code that shows a simple case of this (taken from luaV_execute, and this was pretty much the first thing I happened to look at):
bb0:
%0_0: ptr = arg(0)
%0_1: ptr = arg(1)
call __yk_trace_basicblock(426i32, 0i32)
%0_3: ptr = load @shadowstack_0
%0_4: ptr = ptr_add %0_3, 208
*@shadowstack_0 = %0_4
%0_6: ptr = ptr_add %0_3, 0
...
br bb4
bb1:
call __yk_trace_basicblock(426i32, 1i32)
br bb3
bb2:
call __yk_trace_basicblock(426i32, 2i32)
br bb3
bb3:
%3_0: ptr = phi bb1 -> %608_2, bb2 -> %6_1
call __yk_trace_basicblock(426i32, 3i32)
br bb4
bb4:
%4_0: ptr = phi bb0 -> %0_1, bb3 -> %3_0
call __yk_trace_basicblock(426i32, 4i32)
%4_2: i32 = load %0_19, volatile
br bb6
bb5:
call __yk_trace_basicblock(426i32, 5i32)
%5_1: ptr = ptr_add %6_1, 16
%5_2: ptr = load %5_1
br bb6
bb6:
bb1, bb2, and bb5 are the targets of conditional branches: bb3, bb4, and bb6 are only ever the targets of direct branches.
The __yk_trace_basicblock calls in bb3, bb4 and bb6 are thus redundant: if we're in bb1 or bb2 we know we're then guaranteed to go to bb3 then bb4 then bb6 in order.
Which __yk_trace_basicblocks can we remove?
[Assumption: we're never going to hack this into trace_builder, so let's only consider j2.]
I have an experimental branch in j2 which turns aot_to_hir into more-or-less an "interpreter over trace actions". In several cases I literally throw away trace actions without checking them, as they can only tell me something I already know: it would, I believe, not be difficult to throw a lot more trace actions away!
The obvious question is: how do we know which trace actions we can throw away? The basic intuition I gave above is a good start but we can't quite throw away every __yk_trace_basicblock that it suggests. In particular -- and I think this is the only case where we need to do this, but I can't promise I've thought through every possibility yet -- we need to make sure that outlining works correctly.
The general case is where we call a function f for which we don't know if we have IR or not: we also don't know if f calls other functions with IR or not. We thus have to make sure that the call is followed by a __yk_trace_basicblock so that we can tell when to stop outlining. It is always correct to add __yk_trace_basicblock in such cases.
Optimising outlining
The good news is that we can clearly optimise some cases. I think, in increasing order of efficiency, we have:
- If
fhas IR and doesn't call any other functions, there is no chance of outlining, we don't need to follow thecallby a__yk_trace_basicblock. - If
fhas IR and calls other functions that all have IR, there is also no chance of outlining, and we don't need to follow thecallby a__yk_trace_basicblock. - If
fhas IR and the last function call(s) infbeforereturns all have IR, we can always work out whenfis returning to us, and we don't need to follow thecallby a__yk_trace_basicblock
I believe the third of these subsumes the first two, but it is obviously trickier to implement.
Optimising function start
If a function f is only ever called via call (and not icall), we don't need to have __yk_trace_basicblock in the first basic block: we know, by definition, that calling f will start at (bbidx 0, iidx 0).