Skip to content

Conversation

@wucke13
Copy link
Collaborator

@wucke13 wucke13 commented Jan 2, 2026

Pull Request Overview

After reading an insightful set of comments from @hoshinolina (again: thank you!) on Lobste.rs I got convinced to rewrite the linear memory using AtomicU8 instead of UnsafeCell<u8>.

TODO or Help Wanted

It doesn't work yet 😆.

There must be a subtle (to me) bug in the rewrite, causing one of our internal tests (memory_init_test_4) and a handful of the memory_copy.wast and memory_init.wast tests to fail. I don't have the patience to debug it today.

Edit 1:

TESTSUITE_SAVE=1 ALLOW_TEST_PATTERN=memory_copy.wast cargo test --test wasm_spec_testsuite -- --nocapture | grep ❌ reveals the specific test statements that fail for mem.copy.

Edit 2:

all affected functions invoke mem.copy, so likely the issue is in that? TESTSUITE_SAVE=1 ALLOW_TEST_PATTERN=memory_(init|copy).wast cargo test --test wasm_spec_testsuite -- --nocapture | grep ❌ to get them all.

Edit 3:

The problem occurred for mem.copy within the same memory if source and destination overlap while the source index is smaller than the destination index. In this case, the copy would overwrite source values before they were read at all, causing havoc. The simple fix: for this specific case, do the copy in reverse order.

Checks

  • Using Nix
    • Ran nix fmt
    • Ran nix flake check '.?submodules=1'
  • Using Rust tooling
    • Ran cargo fmt
    • Ran cargo test
    • Ran cargo check
    • Ran cargo build
    • Ran cargo doc

Benchmark Results

This does in fact negatively affect performance. Especially on the memory load/store hungry fibonacci_loop benchmark, we see a moderate increase of ~11 % in runtime.

group                          benchmark-current.baseline             benchmark-main.baseline
-----                          --------------------------             -----------------------
fibonacci_loop/our/1           1.07   178.5±14.07ns  5.3 MElem/sec    1.00    167.3±2.08ns  5.7 MElem/sec
fibonacci_loop/our/2           1.06   234.8±17.79ns  8.1 MElem/sec    1.00    221.1±2.60ns  8.6 MElem/sec
fibonacci_loop/our/4           1.06   359.5±16.87ns 10.6 MElem/sec    1.00    338.9±2.11ns 11.3 MElem/sec
fibonacci_loop/our/8           1.08   599.0±31.91ns 12.7 MElem/sec    1.00   554.6±69.96ns 13.8 MElem/sec
fibonacci_loop/our/16          1.13  1079.8±19.48ns 14.1 MElem/sec    1.00   959.6±16.83ns 15.9 MElem/sec
fibonacci_loop/our/32          1.12  1995.7±54.10ns 15.3 MElem/sec    1.00  1774.3±27.57ns 17.2 MElem/sec
fibonacci_loop/our/64          1.12      3.8±0.09µs 15.9 MElem/sec    1.00      3.4±0.06µs 17.8 MElem/sec
fibonacci_loop/our/128         1.10      7.5±0.19µs 16.2 MElem/sec    1.00      6.8±0.03µs 17.9 MElem/sec
fibonacci_loop/our/256         1.06     15.3±0.28µs 16.0 MElem/sec    1.00     14.4±1.12µs 17.0 MElem/sec
fibonacci_loop/our/512         1.12     29.7±0.63µs 16.5 MElem/sec    1.00     26.5±0.44µs 18.4 MElem/sec
fibonacci_loop/our/1024        1.01     60.3±1.48µs 16.2 MElem/sec    1.00     59.5±1.21µs 16.4 MElem/sec
fibonacci_loop/our/2048        1.00    119.1±2.63µs 16.4 MElem/sec    1.02    121.4±2.08µs 16.1 MElem/sec
fibonacci_loop/our/4096        1.10   238.0±13.36µs 16.4 MElem/sec    1.00   215.5±10.87µs 18.1 MElem/sec
fibonacci_loop/our/8192        1.14   480.9±39.01µs 16.2 MElem/sec    1.00    420.4±6.57µs 18.6 MElem/sec
fibonacci_loop/our/16384       1.14   950.9±70.02µs 16.4 MElem/sec    1.00   836.4±11.83µs 18.7 MElem/sec
fibonacci_loop/our/32768       1.13  1903.9±27.69µs 16.4 MElem/sec    1.00  1690.7±29.00µs 18.5 MElem/sec
fibonacci_loop/our/65536       1.12      3.8±0.34ms 16.3 MElem/sec    1.00      3.4±0.01ms 18.3 MElem/sec
fibonacci_loop/our/131072      1.10      7.6±0.43ms 16.5 MElem/sec    1.00      6.9±0.38ms 18.1 MElem/sec
fibonacci_loop/our/262144      1.12     15.2±0.79ms 16.5 MElem/sec    1.00     13.6±0.97ms 18.4 MElem/sec
fibonacci_loop/our/524288      1.14     30.7±1.17ms 16.3 MElem/sec    1.00     27.0±1.03ms 18.5 MElem/sec
fibonacci_loop/our/1048576     1.10     60.1±1.45ms 16.6 MElem/sec    1.00     54.6±1.12ms 18.3 MElem/sec
fibonacci_recursive/our/1      1.01    206.9±3.08ns  4.6 MElem/sec    1.00    203.9±2.42ns  4.7 MElem/sec
fibonacci_recursive/our/2      1.00    288.0±4.35ns  6.6 MElem/sec    1.09   313.2±18.09ns  6.1 MElem/sec
fibonacci_recursive/our/4      1.00    484.6±3.07ns  7.9 MElem/sec    1.00    483.2±5.84ns  7.9 MElem/sec
fibonacci_recursive/our/8      1.00   769.4±14.46ns  9.9 MElem/sec    1.01   779.2±16.32ns  9.8 MElem/sec
fibonacci_recursive/our/16     1.05  1425.9±24.97ns 10.7 MElem/sec    1.00  1352.5±14.86ns 11.3 MElem/sec
fibonacci_recursive/our/32     1.02      2.5±0.04µs 12.0 MElem/sec    1.00      2.5±0.12µs 12.3 MElem/sec
fibonacci_recursive/our/64     1.03      4.6±0.06µs 13.2 MElem/sec    1.00      4.5±0.05µs 13.6 MElem/sec
fibonacci_recursive/our/128    1.04      8.9±0.17µs 13.7 MElem/sec    1.00      8.6±0.08µs 14.3 MElem/sec
fibonacci_recursive/our/256    1.01     16.7±0.25µs 14.6 MElem/sec    1.00     16.5±0.25µs 14.8 MElem/sec
fibonacci_recursive/our/512    1.03     34.0±3.88µs 14.3 MElem/sec    1.00     33.0±1.29µs 14.8 MElem/sec

Github Issue

This approach presents a path towards solving #162 .

@wucke13
Copy link
Collaborator Author

wucke13 commented Jan 2, 2026

Copying takes place as if the bytes were copied from src to a temporary array and then copied from the array to dst.

Found the error. For overlapping source and destination, the order of copy must be so that no byte of src in is overwritten before it was read.

@florianhartung
Copy link
Collaborator

I think we can even take advantage of AtomicU8::get_mut_slice for host accesses. Although this requires Nighly Rust as of now :/

@wucke13 wucke13 force-pushed the dev/wucke13/refactor-lin-memory branch from 870f4c5 to d374b17 Compare January 5, 2026 15:32
@codecov
Copy link

codecov bot commented Jan 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/execution/store/linear_memory.rs 97.52% <100.00%> (+0.39%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wucke13 wucke13 force-pushed the dev/wucke13/refactor-lin-memory branch 2 times, most recently from 3e1c1b8 to 7d07198 Compare January 7, 2026 15:02
cemonem
cemonem previously approved these changes Jan 8, 2026
@wucke13 wucke13 force-pushed the dev/wucke13/refactor-lin-memory branch 3 times, most recently from d652567 to 4c25ad1 Compare January 9, 2026 14:13
After an enlightening discussion with Asahi Lina[1], I was left
convinced that using atomic operations to implement non-atomic Wasm
instructions might be a good idea after all. This commit is the
manifestation of this realization.

[1] https://lobste.rs/s/cwdone/why_are_we_worried_about_memory_access#c_obd8av

Co-authored-by: Florian <[email protected]>
Signed-off-by: Wanja Zaeske <[email protected]>
@wucke13 wucke13 force-pushed the dev/wucke13/refactor-lin-memory branch from 4c25ad1 to 9b8e96e Compare January 9, 2026 14:21
@wucke13 wucke13 added this pull request to the merge queue Jan 9, 2026
Merged via the queue into main with commit 2af195b Jan 9, 2026
13 checks passed
@wucke13 wucke13 deleted the dev/wucke13/refactor-lin-memory branch January 9, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants