Add `PrunedStore` and use it in Wasmi's executor #1449

Robbepop · 2025-03-30T11:21:00Z

Closes #1448
Unblocks #1433

This adds a PrunedStore that can be created from a Store<T>. It can safely be restored back to a Store<T> and use certain Store<T> related methods. However, this conversion is not entirely free. PrunedStore provides an efficient way to access the StoreInner parts of a Store<T>.

Note: the code to convert between Store<T> and PrunedStore uses unsafe Rust code. I execute the PrunedStore conversion tests with miri and it did not find any unsoundness while running the tests. The entire Wasmi testsuite passes using the PrunedStore.

Downsides

The T in Store<T> is now required to be 'static. Not a very big deal in common user code but not great either, especially since Wasmtime's Store<T> does not have this requirement. This is probably because Wasmtime simly does not perform the TypeId check at all.
Performance tests indicate that Wasmi execution is heavily affected by the changes introduced by this PR. Some test cases perform similar to before (e.g. tiny_keccak) others perform way worse (e.g. counter). This indicates that Wasmi performs the same overall but performance of different op-codes changed significantly.

This is important to enforce that all TypedStore<T> are of equal sizes.

codecov · 2025-03-30T11:29:35Z

Codecov Report

Attention: Patch coverage is 89.47368% with 12 lines in your changes missing coverage. Please review.

Project coverage is 71.50%. Comparing base (53aa916) to head (601776d).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/wasmi/src/store.rs	86.27%	7 Missing ⚠️
crates/wasmi/src/engine/executor/instrs.rs	85.71%	2 Missing ⚠️
crates/wasmi/src/engine/executor/instrs/call.rs	94.87%	2 Missing ⚠️
crates/wasmi/src/engine/executor/mod.rs	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1449      +/-   ##
==========================================
+ Coverage   71.47%   71.50%   +0.02%     
==========================================
  Files         161      161              
  Lines       16352    16364      +12     
==========================================
+ Hits        11688    11701      +13     
+ Misses       4664     4663       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This way PrunedStore is pointer sized.

This is an API can can more easily be wrapped for an indirect call via a PrunedStore.

This is a better name since this type manages both, function parameters and results.

This allows to call host functions via a PrunedStore.

This way they cannot be used with foo.bar() syntax to disambiguate them with their Deref[Mut] impls.

It no longer holds a reference to a Store<Prune> but simply is the shape of a Store<Prune> and now has #[repr(transparent)] to make transmutation sound.

This is just for demo purposes (it works) and not final code. Call hooks from Wasm -> Host are now handled at the TypedStore.

Also make RestorePrunedWrapper::restore infallible.

Robbepop · 2025-03-30T21:28:25Z

I just benchmarked the Wasmi executor that is now using the new PrunedStore type and is therefore fully non-generic.

Performance is affected a lot. It seems that Wasmi can still achieve the same performance, but op-codes are affected very differently. For example, counter dropped significantly whereas tiny_keccak remained stable and global_bump even improved ever so slightly.
This usually means that we can fix these performance rollercoaster with some inspection and #[inline(never)], #[inline(always)] or normal #[inline] annotations. However, this is super flaky and might change with every Rust version. This would be solved if Rust had explicit-tail-calls.

Robbepop · 2025-04-02T09:15:58Z

Things I have already tried out to fix the performance regressions of this PR:

Putting #[inline], #[inline(never)] and #[inline(always)] and combinations in various selected places in the Wasmi executor. While this might seem arbitrary it helped a lot with past performance regressions. Unfortunately, so far I only yielded the same performance or regressed performance further.
I tried to shift Wasmi bytecode instructions around, thus changing the structure of the jump table. This helped slightly for certain benchmarks (like tiny_keccak) but resulted in a regression overall.
I tried to PGO compile Wasmi. This helped when used with a very limited set of benchmarks. For example, when running PGO with the tiny_keccak test case it yielded a performance improvement of ~15%. However, as soon as the benchmark set was expanded there were no performance gains anymore.
I have tried out different LLVM flags via rustc -Cllvm-flag to have an impact on LLVM's optimization heuristics. Quite a few LLVM flags have been tried out, none of which actually succeeded.
I have tried out cargo flamegraph to get a better understanding. However, flamegraphs are not very helpful for Wasmi performance profiling because one has to compile with maximum optimization flags in order to profile but when doing so Wasmi collapses all the executor instructions into a single execute_instrs function and then the entire flamegraph is just flat form there on and no useful information is gained. Having a tail-call based instruction dispatch would really help here.
I have tried Xcode's Instruments profiler. It worked but yielded very similar results to cargo flamegraph and thus was not very helpful for me.
I have tried to make Wasmi's state machine more explicit to Rust and LLVM in order to help heuristics as I have read that actually LLVM should generate way better dispatch logic than it actually does for Wasmi: it currently generates a single jump table based dispatched and it ideally should decentralize those jumps into each op-code handler block to help the branch predictor, especially on ARM hardware, but it doesn't.
- PR: https://github.com/wasmi-labs/wasmi/tree/rf-add-continue-next-to-hot-loop
I have tried to put store into the Executor struct to see what performance implications this has. It yielded no differences in performance.
- PR: https://github.com/wasmi-labs/wasmi/tree/rf-add-pruned-store-v2
I tried removing the safety checks in PrunedStore::restore to see whether store safety checks have any (major) impact on Wasmi's performance for weird reasons. But as expected it did not yield any performance changes.
My suspicion is that in the generic Wasmi executor the store parameter was not put into a register and the non-generic Wasmi executor can put it in one. So I tried to use black_box and other techniques like wrapping store into yet another struct to prevent Rust from doing so but with no success.
I have tried out various different sets of optimization flags but, as expected, this yielded no performance changes to what was known before.
I have tried enabling the simd crate feature to see whether enabling even more Wasmi op-codes would yield crazy different performance metrics. But, as expected, this just yielded the known performance regressions on top by roughly 10-15%.
I tried applying #[cold] attributes onto variaous sets of op-code handlers that make use of the store parameter.
I have used cargo asm to have a look at the generated LLVM and aarch64 assembly for Wasmi's interpreter loop. This revealed that LLVM does a very poor job at optimizing Wasmi's jump table as it uses a central jump table dispatch but does not put branches into each op-code handler which would significantly boost performance especially on ARM hardware. However, the same is true for main so not a performance issue specific to this PR.
- aarch64: https://gist.github.com/Robbepop/1a7b0d7ffa614a44aff61b76727b329d
- LLVM IR: https://gist.github.com/Robbepop/d9322dcdbcecc7353b7c8cf63ead7cc9
Currently trying to figure out the differences on the generated LLVM IR and aarch64 assembly between main and the PR:
- main.s: https://gist.github.com/Robbepop/10f34cd425d9d9cec6f9a87e7c2ce812
- main.ll: https://gist.github.com/Robbepop/0039c70eb03b351c034b73c8f098472d
- pr.s: https://gist.github.com/Robbepop/5b581c70c0c96e92ae3b26a5196eb549
- pr.ll: https://gist.github.com/Robbepop/8dc01a95c0c10ff4ff05fe2389fe6153
- Experiment 0: put #[inline(never)] on all store related op-code handlers:
  - Gist: https://gist.github.com/Robbepop/667d4037ee387a29d7247f067046f971
  - Outcome: drastic changes in locals usage but no change to performance.
I even tried a walk in the park but with no success.

This prevents LLVM from generating very weird phi gigantic nodes.

Robbepop · 2025-04-03T12:24:35Z

After the most recent commits, performance is somewhat on par with main again.

Benchmarks

Same or Better

Real Use Cases

Loads and Stores

Global Writes

Regressions

Robbepop · 2025-04-03T12:38:42Z

Despite the remaining performance issues (e.g. counter) I think this is good to go. The performance improvements balance out the regressions kind of. We can have follow-up PRs to fix the remaining performance issues.

Robbepop added 7 commits March 30, 2025 11:47

add TypedStore internal parts of Store

96ee4e7

put TypedStore's data into a Box

80c1487

This is important to enforce that all TypedStore<T> are of equal sizes.

rename TypedStore to TypedStoreInner

7f83380

add test asserting equal sizes for all TypedStoreInner<T>

a4e214d

move data field to last position

cb55ee6

add PrunedStore definitions and tests

e5e77b9

change expect -> allow

161fac1

Robbepop added 20 commits March 30, 2025 13:42

rename store field to pruned

d55b17f

move id field from PrunedStore to Store<T>

c374e2d

This way PrunedStore is pointer sized.

make Wasmi executor use Store::call_host_func

5cf8ece

This is an API can can more easily be wrapped for an indirect call via a PrunedStore.

rename FuncParams to FuncInOut

3fb3843

This is a better name since this type manages both, function parameters and results.

move func_inout.rs into wasmi::func submodule

d0a37b0

fix intra doc links

a4cd674

add call_host_func trampoline to Store<T>

3c14228

This allows to call host functions via a PrunedStore.

improve Debug impl for ResourceLimiterQuery

f0ed76e

apply rustfmt

f284636

properly silence warning

04d068d

use proper signatures in PrunedStore::inner[_mut] methods

72a9bed

This way they cannot be used with foo.bar() syntax to disambiguate them with their Deref[Mut] impls.

make PrunedStore::restore a static method

7d2f42b

add PrunedStore::call_host_func method

096e391

make PrunedStore a transparent type

c5b2a0b

It no longer holds a reference to a Store<Prune> but simply is the shape of a Store<Prune> and now has #[repr(transparent)] to make transmutation sound.

generalize PrunedStore restoration

6094ea7

refactor and clean-up new design

7c177a3

make use of PrunedStore in call

cc92fc2

This is just for demo purposes (it works) and not final code. Call hooks from Wasm -> Host are now handled at the TypedStore.

add TypedStore::store_inner_and_resource_limiter_ref

d5e73be

Also make RestorePrunedWrapper::restore infallible.

port Wasmi's executor to PrunedStore

0e2aa82

apply clippy suggestions

1affe19

clean-up PruneStore::inner[_mut] methods

151f106

Robbepop added 3 commits March 30, 2025 23:47

unsilence warnings

a9edf3c

put some inlines where it makes sense

3599657

make Store::prune method crate private

7900728

Robbepop changed the title ~~experiment: Add PrunedStore~~ Add PrunedStore and use it in Wasmi's executor Mar 31, 2025

Merge branch 'main' into rf-add-pruned-store

0189ade

Robbepop mentioned this pull request Mar 31, 2025

Move FuncInOut to func submodule #1451

Merged

Merge branch 'main' into rf-add-pruned-store

437c47d

Robbepop mentioned this pull request Mar 31, 2025

Add Store<T>::inner[_mut] getters #1452

Merged

Robbepop added 2 commits March 31, 2025 13:51

Merge branch 'main' into rf-add-pruned-store

7b72444

revert some unnecessary changes compared to main

333eb16

Robbepop mentioned this pull request Mar 31, 2025

Wasmi executor and store improvements #1453

Merged

Robbepop added 7 commits March 31, 2025 14:46

Merge branch 'main' into rf-add-pruned-store

62a5996

Merge branch 'main' into rf-add-pruned-store

dabe8a0

use imported type_name

e8c9f1a

fix docs

a0579c0

deduplicate code in PrunedStore

7be71f3

add missing docs to PrunedStore API

9fd2f14

apply rustfmt

21a0fb1

Robbepop added 4 commits April 2, 2025 18:38

apply inline(always) to execute_branch_binop op-handlers

a1f241e

fix most of the call instruction regressions

2462f0e

remove commented out inline annotations

82206c1

put inline on merge_call_frame

601776d

This prevents LLVM from generating very weird phi gigantic nodes.

Robbepop merged commit 7b3e45c into main Apr 3, 2025
19 checks passed

Robbepop deleted the rf-add-pruned-store branch April 3, 2025 12:38

Robbepop mentioned this pull request Apr 3, 2025

Experiment: Try to share signatures for all Wasmi bytecode executors #1433

Open

This was referenced May 6, 2025

Bump wasmi_runtime_layer to v0.45 DouglasDwyer/wasm_runtime_layer#48

Closed

Store in Wasmi v0.45 with T:'static bound breaks use case #1503

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `PrunedStore` and use it in Wasmi's executor #1449

Add `PrunedStore` and use it in Wasmi's executor #1449

Uh oh!

Robbepop commented Mar 30, 2025 •

edited

Loading

Uh oh!

codecov bot commented Mar 30, 2025 •

edited

Loading

Uh oh!

Robbepop commented Mar 30, 2025 •

edited

Loading

Uh oh!

Robbepop commented Apr 2, 2025 •

edited

Loading

Uh oh!

Robbepop commented Apr 3, 2025 •

edited

Loading

Uh oh!

Robbepop commented Apr 3, 2025

Uh oh!

Uh oh!

Uh oh!

Add PrunedStore and use it in Wasmi's executor #1449

Add PrunedStore and use it in Wasmi's executor #1449

Uh oh!

Conversation

Robbepop commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Downsides

Uh oh!

codecov bot commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Robbepop commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Robbepop commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Robbepop commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Same or Better

Real Use Cases

Loads and Stores

Global Writes

Regressions

Uh oh!

Robbepop commented Apr 3, 2025

Uh oh!

Uh oh!

Uh oh!

Add `PrunedStore` and use it in Wasmi's executor #1449

Add `PrunedStore` and use it in Wasmi's executor #1449

Robbepop commented Mar 30, 2025 •

edited

Loading

codecov bot commented Mar 30, 2025 •

edited

Loading

Robbepop commented Mar 30, 2025 •

edited

Loading

Robbepop commented Apr 2, 2025 •

edited

Loading

Robbepop commented Apr 3, 2025 •

edited

Loading