Skip to content

Optimising away many __yk_trace_basicblock calls #1955

@ltratt

Description

@ltratt

Now that we are free of hwtracer and its oddities, we can rethink where we insert __yk_trace_basicblock calls. I believe we can remove a substantial amount of them.

Outline of the idea

Here's the basic intuition: in many cases, we know for sure that block A is followed by block B and that block B is only ever reached by direct branches. In such cases, we do not need to have a __yk_trace_basicblock in B.

Here's example AOT code that shows a simple case of this (taken from luaV_execute, and this was pretty much the first thing I happened to look at):

  bb0:
    %0_0: ptr = arg(0)
    %0_1: ptr = arg(1)
    call __yk_trace_basicblock(426i32, 0i32)
    %0_3: ptr = load @shadowstack_0
    %0_4: ptr = ptr_add %0_3, 208
    *@shadowstack_0 = %0_4
    %0_6: ptr = ptr_add %0_3, 0
    ...
    br bb4
  bb1:
    call __yk_trace_basicblock(426i32, 1i32)
    br bb3
  bb2:
    call __yk_trace_basicblock(426i32, 2i32)
    br bb3
  bb3:
    %3_0: ptr = phi bb1 -> %608_2, bb2 -> %6_1
    call __yk_trace_basicblock(426i32, 3i32)
    br bb4
  bb4:
    %4_0: ptr = phi bb0 -> %0_1, bb3 -> %3_0
    call __yk_trace_basicblock(426i32, 4i32)
    %4_2: i32 = load %0_19, volatile
    br bb6
  bb5:
    call __yk_trace_basicblock(426i32, 5i32)
    %5_1: ptr = ptr_add %6_1, 16
    %5_2: ptr = load %5_1
    br bb6
  bb6:

bb1, bb2, and bb5 are the targets of conditional branches: bb3, bb4, and bb6 are only ever the targets of direct branches.

The __yk_trace_basicblock calls in bb3, bb4 and bb6 are thus redundant: if we're in bb1 or bb2 we know we're then guaranteed to go to bb3 then bb4 then bb6 in order.

Which __yk_trace_basicblocks can we remove?

[Assumption: we're never going to hack this into trace_builder, so let's only consider j2.]

I have an experimental branch in j2 which turns aot_to_hir into more-or-less an "interpreter over trace actions". In several cases I literally throw away trace actions without checking them, as they can only tell me something I already know: it would, I believe, not be difficult to throw a lot more trace actions away!

The obvious question is: how do we know which trace actions we can throw away? The basic intuition I gave above is a good start but we can't quite throw away every __yk_trace_basicblock that it suggests. In particular -- and I think this is the only case where we need to do this, but I can't promise I've thought through every possibility yet -- we need to make sure that outlining works correctly.

The general case is where we call a function f for which we don't know if we have IR or not: we also don't know if f calls other functions with IR or not. We thus have to make sure that the call is followed by a __yk_trace_basicblock so that we can tell when to stop outlining. It is always correct to add __yk_trace_basicblock in such cases.

Optimising outlining

The good news is that we can clearly optimise some cases. I think, in increasing order of efficiency, we have:

  1. If f has IR and doesn't call any other functions, there is no chance of outlining, we don't need to follow the call by a __yk_trace_basicblock.
  2. If f has IR and calls other functions that all have IR, there is also no chance of outlining, and we don't need to follow the call by a __yk_trace_basicblock.
  3. If f has IR and the last function call(s) in f before returns all have IR, we can always work out when f is returning to us, and we don't need to follow the call by a __yk_trace_basicblock

I believe the third of these subsumes the first two, but it is obviously trickier to implement.

Optimising function start

If a function f is only ever called via call (and not icall), we don't need to have __yk_trace_basicblock in the first basic block: we know, by definition, that calling f will start at (bbidx 0, iidx 0).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions