Inlining #to:do: #35

smarr · 2021-08-05T10:43:50Z

This PR inlines #to:do: and its block.

For the interpreters, this is generally a good win.
The SomSom benchmarks see run-time reductions of 5-8% for the AST interpreter, and 7-9% for the bytecode interpreter.

For the micro and macro benchmarks on the interpreters, the benefits can be up to 37% on the Sum microbenchmark for the AST interpreter, and 30% on the Loop microbenchmark for the BC interpreter. Though, the BC interpreter seems to benefit more.

However, steady-state JIT performance is a different picture. DeltaBlue increases with 9% and NBody with 24% after this change. For the AST interpreter, this also includes already mitigations in the ToDoInlined node, which nils out the loop index variable so that it can be optimized better by the compiler. The key issue here is that the lifetime of the loop index variable is much less clearly defined as before, and the compiler struggles optimizing.

For the bytecode interpreter things are worse, and at this point, I do not yet have a solution.

The issue is illustrated with the Dispatch microbenchmark. Trace: https://gist.github.com/smarr/08b234b741b9ffd2b77cee49836d1332

Because of the inlining, the loop index is now on the stack, and on the frame.
This prolongs the lifetime, and because the compiler doesn't do well with doing optimizations around guards, it keeps a lot of redundant operations in the trace.

At this pint, PageRank's run time increases by 100%, NBody by 27%, Dispatch by 143%, Loop by 89%.
Thus, it comes at a significant cost. There are also some run time reductions, but the overheads/increases seem to be severe.

https://rebench.stefan-marr.de/compare/RPySOM/3eb75e51b4d02c327f03dd5f01f2cf3f77bec220/13435b759281863f6a3458e006f3e728fc51593f#macro-steady-RPySOM-ast-jit

…uring inling Block arguments are turned into locals during inlining. This might not be perfect for the general case, but works well for to:do:. This commit also adapts the variable write nodes so that I can pass in the value directly, and don't have to have a child node. This requires some adaptation in the node traversal methods in abstract_node.py to handle the case that child nodes may be legitimately None. This also moves the lexical scope into the abstract method. Drawback is that also trivial methods have the extra field. Signed-off-by: Stefan Marr <[email protected]>

There's a failing assertion on logging with PYPYLOG, which is hopefully fixed with this. Signed-off-by: Stefan Marr <[email protected]>

Signed-off-by: Stefan Marr <[email protected]>

The biggest drawback of this design is that the lifetime of the frame is longer than the block activation in the loop, which is a problem for performance, because the compiler materializes the store. Using `if we_are_jitted()` prevents the unnecessary store in the trace, and at the same time, avoids much of the overhead in the interpreter. Signed-off-by: Stefan Marr <[email protected]>

This requires two new bytecodes DUP_SECOND (in addition to DUP, which duplicates the first/top element of the stack), and JUMP_IF_GREATER Signed-off-by: Stefan Marr <[email protected]>

Signed-off-by: Stefan Marr <[email protected]>

…ining for them Signed-off-by: Stefan Marr <[email protected]>

… to handle double receivers Signed-off-by: Stefan Marr <[email protected]>

…other trivial methods needs similar fixes Signed-off-by: Stefan Marr <[email protected]>

Signed-off-by: Stefan Marr <[email protected]>

smarr · 2023-02-09T20:27:02Z

Latest results: https://rebench.dev/RPySOM/compare/6ad9262edc431f9b2155b9f945263c9b255a41b3..ecdb4d8fadfc2374b22d711a670a150c8696e92c#micro-steady-RPySOM-bc-jit

This is based on SOM-st/PySOM#35 which was never merged because of the impact on the JIT compiler. Though, since we have a pure interpreter here, it's a good win on benchmarks that use a `#to:do:` loop. The median run time is reduced by only 2% though. https://rebench.dev/SOMpp/compare/692cc6c47727a7fa76923daf92e0bb8c7d4be3b1..252923132894fff18b7d1364dfb2981f439f146c

smarr force-pushed the down-to-do-prim branch 2 times, most recently from dcdefbc to 8bb081f Compare August 6, 2021 15:12

smarr force-pushed the down-to-do-prim branch from 8bb081f to 3e1dc4f Compare August 19, 2021 22:50

smarr force-pushed the down-to-do-prim branch from 3e1dc4f to a0055e5 Compare February 8, 2023 16:56

smarr changed the title ~~Inlining #and:/#&&/#or:/#||/#to:do: and primitive for #downTo:do:~~ Inlining #to:do: Feb 8, 2023

smarr added 8 commits February 8, 2023 17:36

Added testing of printable locations, and fix issues

0c5eec3

There's a failing assertion on logging with PYPYLOG, which is hopefully fixed with this. Signed-off-by: Stefan Marr <[email protected]>

Remove return from write_value

fbe63a8

Signed-off-by: Stefan Marr <[email protected]>

Inline #to:do: in the bytecode interpreter

d6eb452

This requires two new bytecodes DUP_SECOND (in addition to DUP, which duplicates the first/top element of the stack), and JUMP_IF_GREATER Signed-off-by: Stefan Marr <[email protected]>

Added bytecodes that nils out the loop variable in jitted mode

11ab84b

Signed-off-by: Stefan Marr <[email protected]>

Nil var before loop

c5176b5

Signed-off-by: Stefan Marr <[email protected]>

Fix emitting new bytecodes

2e5e16d

Signed-off-by: Stefan Marr <[email protected]>

smarr force-pushed the down-to-do-prim branch from a0055e5 to 2e5e16d Compare February 8, 2023 17:36

smarr added 8 commits February 9, 2023 02:11

The trivial LiteralReturn methods need a source section to enable inl…

816559f

…ining for them Signed-off-by: Stefan Marr <[email protected]>

The literal to:do: as inlined in the AST interpreter needs to be able…

0d0557c

… to handle double receivers Signed-off-by: Stefan Marr <[email protected]>

Fix the scope merging for LiteralReturn in AST interpreter: TODO the …

1c6feb8

…other trivial methods needs similar fixes Signed-off-by: Stefan Marr <[email protected]>

Support doubles in the inlined #to:do: in the BC interpreter

b35b16a

Signed-off-by: Stefan Marr <[email protected]>

Missing source section [TODO] add test to find this

8607f06

Signed-off-by: Stefan Marr <[email protected]>

Fix formatting

4669a75

Signed-off-by: Stefan Marr <[email protected]>

Needs separate JitDrivers for int and double

796667e

Signed-off-by: Stefan Marr <[email protected]>

Added missing source section

ecdb4d8

Signed-off-by: Stefan Marr <[email protected]>

smarr mentioned this pull request Aug 2, 2024

Add inlining for #to:do: loops SOM-st/SOMpp#36

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inlining #to:do: #35

Inlining #to:do: #35

Uh oh!

smarr commented Aug 5, 2021 •

edited

Loading

Uh oh!

smarr commented Feb 9, 2023

Uh oh!

Uh oh!

Inlining #to:do: #35

Are you sure you want to change the base?

Inlining #to:do: #35

Uh oh!

Conversation

smarr commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarr commented Feb 9, 2023

Uh oh!

Uh oh!

smarr commented Aug 5, 2021 •

edited

Loading