Optimising away many `__yk_trace_basicblock` calls

Now that we are free of hwtracer and its oddities, we can rethink where we insert `__yk_trace_basicblock` calls. I believe we can remove a substantial amount of them.


## Outline of the idea

Here's the basic intuition: in many cases, we know for sure that block A is followed by block B and that block B is only ever reached by direct branches. In such cases, we do not need to have a `__yk_trace_basicblock` in B.

Here's example AOT code that shows a simple case of this (taken from luaV_execute, and this was pretty much the first thing I happened to look at):

```
  bb0:
    %0_0: ptr = arg(0)
    %0_1: ptr = arg(1)
    call __yk_trace_basicblock(426i32, 0i32)
    %0_3: ptr = load @shadowstack_0
    %0_4: ptr = ptr_add %0_3, 208
    *@shadowstack_0 = %0_4
    %0_6: ptr = ptr_add %0_3, 0
    ...
    br bb4
  bb1:
    call __yk_trace_basicblock(426i32, 1i32)
    br bb3
  bb2:
    call __yk_trace_basicblock(426i32, 2i32)
    br bb3
  bb3:
    %3_0: ptr = phi bb1 -> %608_2, bb2 -> %6_1
    call __yk_trace_basicblock(426i32, 3i32)
    br bb4
  bb4:
    %4_0: ptr = phi bb0 -> %0_1, bb3 -> %3_0
    call __yk_trace_basicblock(426i32, 4i32)
    %4_2: i32 = load %0_19, volatile
    br bb6
  bb5:
    call __yk_trace_basicblock(426i32, 5i32)
    %5_1: ptr = ptr_add %6_1, 16
    %5_2: ptr = load %5_1
    br bb6
  bb6:
```

`bb1`, `bb2`, and `bb5` are the targets of conditional branches: `bb3`, `bb4`, and `bb6` are only ever the targets of direct branches.

The `__yk_trace_basicblock` calls in `bb3`, `bb4` and `bb6` are thus redundant: if we're in `bb1` or `bb2` we know we're then guaranteed to go to `bb3` then `bb4` then `bb6` in order.


## Which `__yk_trace_basicblock`s can we remove?

[Assumption: we're never going to hack this into trace_builder, so let's only consider j2.]

I have an experimental branch in j2 which turns aot_to_hir into more-or-less an "interpreter over trace actions". In several cases I literally throw away trace actions without checking them, as they can only tell me something I already know: it would, I believe, not be difficult to throw a lot more trace actions away!

The obvious question is: how do we know which trace actions we can throw away? The basic intuition I gave above is a good start _but_ we can't quite throw away every `__yk_trace_basicblock` that it suggests. In particular -- and I think this is the only case where we need to do this, but I can't promise I've thought through every possibility yet -- we need to make sure that outlining works correctly.

The general case is where we call a function `f` for which we don't know if we have IR or not: we also don't know if `f` calls other functions with IR or not. We thus _have_ to make sure that the `call` is followed by a `__yk_trace_basicblock` so that we can tell when to stop outlining. It is always correct to add `__yk_trace_basicblock` in such cases.


### Optimising outlining

The good news is that we can clearly optimise some cases. I think, in increasing order of efficiency, we have:

1. If `f` has IR and doesn't call any other functions, there is no chance of outlining, we don't need to follow the `call` by a `__yk_trace_basicblock`.
2. If `f` has IR and calls other functions that all have IR, there is also no chance of outlining, and we don't need to follow the `call` by a `__yk_trace_basicblock`.
3. If `f` has IR and the last function call(s) in `f` before `return`s all have IR, we can always work out when `f` is returning to us, and we don't need to follow the `call` by a `__yk_trace_basicblock`

I believe the third of these subsumes the first two, but it is obviously trickier to implement.


### Optimising function start

If a function `f` is only ever called via `call` (and not `icall`), we don't need to have `__yk_trace_basicblock` in the first basic block: we know, by definition, that calling `f` will start at `(bbidx 0, iidx 0)`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimising away many `__yk_trace_basicblock` calls #1955

Outline of the idea

Which `__yk_trace_basicblock`s can we remove?

Optimising outlining

Optimising function start

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimising away many __yk_trace_basicblock calls #1955

Description

Outline of the idea

Which __yk_trace_basicblocks can we remove?

Optimising outlining

Optimising function start

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Optimising away many `__yk_trace_basicblock` calls #1955

Which `__yk_trace_basicblock`s can we remove?