Skip to content

Commit 9260ce4

Browse files
authored
pulley: Reimplement wasm loads/stores & memory opcodes (#10154)
* pulley: Reimplement wasm loads/stores & memory opcodes This commit is a large refactoring to reimplement how WebAssembly loads/stores are translated to Pulley opcodes when using the interpreter. Additionally the functionality related to memory support has changed quite a bit with the interpreter as well. This is all based off comments on #10102 with the end goal of folding the two Pulley opcodes today of "do the bounds check" and "do the load" into one opcode. This is intended to reduce the number of opcodes and overall improve interpreter throughput by minimizing turns of the interpreter loop. The basic idea behind this PR is that a new basic suite of loads/stores are added to Pulley which trap if the address is zero. This provides a route to translate trapping loads/stores in CLIF to Pulley bytecode without actually causing segfaults at runtime. WebAssembly translation to CLIF is then updated to use the `select` trick for wasm loads/stores where either 0 is loaded from or the actual address is loaded from. Basic support for translation and such is added for this everywhere, and this ensures that all loads/stores for wasm will be translated successfully with Pulley. The next step was to extend the "g32" addressing mode preexisting in Pulley to support a bounds check as well. New pattern-matches were added to ISLE to search for a bounds check in the address of a trapping load/store. If found then the entire chain of operations necessary to compute the address are folded into a single "g32" opcode which ends up being a fallible load/store at runtime. To fit all this into Pulley this commit contains a number of refactorings to shuffle around existing opcodes related to memory and extend various pieces of functionality here and there: * Pulley now uses a `AddrFoo` types to represent addressing modes as a single immediate rather than splitting it up into pieces for each method. For example `AddrO32` represents "base + offset32". `AddrZ` represents the same thing but traps if the address is zero. The `AddrG32` mode represents a bounds-checked 32-bit linear memory access on behalf of wasm. * Pulley loads/stores were reduced to always using an `AddrFoo` immediate. This means that the old `offset8` addressing mode was removed without replacement here (to be added in the future if necessary). Additionally the suite of sign-extension modes supported were trimmed down to remove 8-to-64, 16-to-64, and 32-to-64 extensions folded as part of the opcode. These can of course always be re-added later but probably want to be added just for the `G32` addressing mode as opposed to all addressing modes. * The interpreter itself was refactored to have an `AddressingMode` trait to ensure that all memory accesses, regardless of addressing modes, are largely just copy/pastes of each other. In the future it might make sense to implement these methods with a macro, but for now it's copy/paste. * In ISLE the `XLoad` generic instruction removed its `ext` field to have extensions handled exclusively in ISLE instead of partly in `emit.rs`. * Float/vector loads/stores now have "g32" addressing (in addition to the "z" that's required for wasm) since it was easy to add them. * Translation of 1-byte accesses on Pulley from WebAssembly to CLIF no longer has a special case for using `a >= b` instead of `a > b - 1` to ensure that the same bounds-check instruction can be used for all sizes of loads/stores. * The bounds-check which folded a load-of-the-bound into the opcode is now present as a "g32bne" addressing mode. with its of suite of instructions to boo. Overall this PR is not a 1:1 replacement of all previous opcodes with exactly one opcode. For example loading 8 bits sign-extended to 64-bits is now two opcodes instead of one. Additionally some previous opcodes have expanded in size where for example the 8-bit offset mode was remove in favor of only having 32-bit offsets. The goal of this PR is to reboot how memory is handled in Pulley. All loads/stores now use a specific addressing mode and currently all operations supported across addressing modes are consistently supported. In the future it's expected that some features will be added to some addressing modes and not others as necessary, for example extending the "g32" addressing mode only instead of all addressing modes. For an evaluation of this PR: * Code size: `spidermonkey.cwasm` file is reduced from 19M to 16M. * Sightglass: `pulldown-cmark` is improved by 15% * Sightglass: `bz2` is improved by 20% * Sightglass: `spidermonkey` is improved by 22% * Coremark: score improved by 40% Overall this PR and new design looks to be a large win. This is all driven by the reduction in opcodes both for compiled code size and execution speed by minimizing turns of the interpreter loop. In the end I'm also pretty happy with how this turned out and I think the refactorings are well worth it. * Use new `is_pulley` helper more * Improve `addrz` helper, tighten up `memory-inbounds.wat` a bit * Improve codegen in a few `memory-inbounds.wat` cases * Fix test expectation
1 parent 0159ff5 commit 9260ce4

36 files changed

Lines changed: 2191 additions & 1652 deletions

File tree

cranelift/codegen/meta/src/pulley.rs

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,7 @@ pub fn generate_rust(filename: &str, out_dir: &Path) -> Result<(), Error> {
186186
let mut pat = String::new();
187187
let mut uses = Vec::new();
188188
let mut defs = Vec::new();
189+
let mut addrs = Vec::new();
189190
for op in inst.operands() {
190191
match op {
191192
// `{Push,Pop}Frame{Save,Restore}` doesn't participate in
@@ -200,6 +201,10 @@ pub fn generate_rust(filename: &str, out_dir: &Path) -> Result<(), Error> {
200201
uses.push(name);
201202
pat.push_str(name);
202203
pat.push_str(",");
204+
} else if ty.starts_with("Addr") {
205+
addrs.push(name);
206+
pat.push_str(name);
207+
pat.push_str(",");
203208
}
204209
}
205210
Operand::Writable { name, ty } => {
@@ -230,12 +235,17 @@ pub fn generate_rust(filename: &str, out_dir: &Path) -> Result<(), Error> {
230235
.iter()
231236
.map(|u| format!("collector.reg_def({u});\n"))
232237
.collect::<String>();
238+
let addrs = addrs
239+
.iter()
240+
.map(|u| format!("{u}.collect_operands(collector);\n"))
241+
.collect::<String>();
233242

234243
rust.push_str(&format!(
235244
"
236245
RawInst::{name} {{ {pat} .. }} => {{
237246
{uses}
238247
{defs}
248+
{addrs}
239249
}}
240250
"
241251
));

cranelift/codegen/src/isa/pulley_shared/inst.isle

Lines changed: 210 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,9 @@
7373
;;
7474
;; How much is written to the register is defined by `ExtKind`. The `flags`
7575
;; control behavior such as endianness.
76-
(XLoad (dst WritableXReg) (mem Amode) (ty Type) (flags MemFlags) (ext ExtKind))
76+
(XLoad (dst WritableXReg) (mem Amode) (ty Type) (flags MemFlags))
7777
(FLoad (dst WritableFReg) (mem Amode) (ty Type) (flags MemFlags))
78-
(VLoad (dst WritableVReg) (mem Amode) (ty Type) (flags MemFlags) (ext VExtKind))
78+
(VLoad (dst WritableVReg) (mem Amode) (ty Type) (flags MemFlags))
7979

8080
;; Stores.
8181
(XStore (mem Amode) (src XReg) (ty Type) (flags MemFlags))
@@ -158,6 +158,22 @@
158158

159159
;;;; Address Modes ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
160160

161+
(type ExtKind (enum None Sign32 Sign64 Zero32 Zero64))
162+
(type VExtKind (enum None S8x8 U8x8 S16x4 U16x4 S32x2 U32x2))
163+
164+
;; Helper to convert a `(Value Offset32)` to `(Value i32)` while peeling off
165+
;; constant addition within the first `Value` into the static offset, if
166+
;; possible.
167+
;;
168+
;; Note that ideally this wouldn't be necessary and we could rely on the egraph
169+
;; pass to do this but that's not implemented at this time.
170+
(type ValueOffset (enum (Both (value Value) (offset i32))))
171+
(decl pure amode_base (Value Offset32) ValueOffset)
172+
(rule (amode_base addr (offset32 offset)) (ValueOffset.Both addr offset))
173+
(rule 1 (amode_base (iadd addr (i32_from_iconst b)) (offset32 offset))
174+
(if-let new_offset (s32_add_fallible b offset))
175+
(ValueOffset.Both addr new_offset))
176+
161177
(type StackAMode extern (enum))
162178

163179
(type Amode
@@ -168,9 +184,55 @@
168184
)
169185
)
170186

171-
(type ExtKind (enum None Sign32 Sign64 Zero32 Zero64))
187+
(decl amode (Value Offset32) Amode)
188+
(rule (amode addr offset)
189+
(if-let (ValueOffset.Both a o) (amode_base addr offset))
190+
(Amode.RegOffset a o))
172191

173-
(type VExtKind (enum None S8x8 U8x8 S16x4 U16x4 S32x2 U32x2))
192+
;;; ISLE representation of the `AddrO32` ("*_o32") addressing mode in Pulley.
193+
(type AddrO32
194+
(enum
195+
(Base
196+
(addr XReg)
197+
(offset i32))))
198+
199+
;; Constructor for the `AddrO32` type used in `*_o32` loads/stores
200+
(decl addro32 (Value Offset32) AddrO32)
201+
(rule (addro32 addr offset)
202+
(if-let (ValueOffset.Both reg off32) (amode_base addr offset))
203+
(AddrO32.Base reg off32))
204+
205+
;;; ISLE representation of the `AddrZ` ("*_z") addressing mode in Pulley.
206+
(type AddrZ
207+
(enum
208+
(Base
209+
(addr XReg)
210+
(offset i32))))
211+
212+
;; Constructor for the `AddrZ` type used in `*_z` loads/stores
213+
(decl addrz (Value Offset32) AddrZ)
214+
(rule (addrz addr offset)
215+
(if-let (ValueOffset.Both reg off32) (amode_base addr offset))
216+
(AddrZ.Base reg off32))
217+
218+
;;; ISLE representation of the `AddrG32` ("*_g32") addressing mode in Pulley.
219+
(type AddrG32
220+
(enum
221+
(RegisterBound
222+
(host_heap_base XReg)
223+
(host_heap_bound XReg)
224+
(wasm_addr XReg)
225+
(offset u16))))
226+
227+
;;; ISLE representation of the `AddrG32Bne` ("*_g32bne") addressing mode in Pulley.
228+
(type AddrG32Bne
229+
(enum
230+
(BoundNe
231+
(host_heap_base XReg)
232+
(host_heap_bound_addr XReg)
233+
(host_heap_bound_offset u8)
234+
(wasm_addr XReg)
235+
(offset u8))))
174236

175237
;; Helper to determine the endianness of `MemFlags` taking the current target
176238
;; into account.
@@ -198,6 +260,7 @@
198260
(rule (sinkable_load value @ (value_type ty))
199261
(if-let inst @ (load flags addr (offset32 offset)) (is_sinkable_inst value))
200262
(if-let true (is_native_endianness (endianness flags)))
263+
(if-let true (memflags_nontrapping flags))
201264
(if-let offset8 (u8_try_from_i32 offset))
202265
(SinkableLoad.Load inst ty addr offset8))
203266

@@ -220,6 +283,143 @@
220283
(decl pure pointer_width () PointerWidth)
221284
(extern constructor pointer_width pointer_width)
222285

286+
(decl pure memflags_nontrapping (MemFlags) bool)
287+
(extern constructor memflags_nontrapping memflags_nontrapping)
288+
289+
(decl pure memflags_is_wasm (MemFlags) bool)
290+
(extern constructor memflags_is_wasm memflags_is_wasm)
291+
292+
;; Helper type to represent a "pending" `AddrG32` value.
293+
(type G32 (enum (All (heap_base Value) (heap_bound Value) (wasm_addr Value) (offset u16))))
294+
295+
;; Auto-conversion from `G32` to `AddrG32`.
296+
(decl gen_addrg32 (G32) AddrG32)
297+
(rule (gen_addrg32 (G32.All base bound wasm offset))
298+
(AddrG32.RegisterBound base bound wasm offset))
299+
(convert G32 AddrG32 gen_addrg32)
300+
301+
;; Helper type to represent a "pending" `AddrG32Bne` value.
302+
(type G32Bne (enum (All (heap_base Value) (heap_bound SinkableLoad) (wasm_addr Value) (offset u8))))
303+
304+
;; Auto-conversion from `G32Bne` to `AddrG32Bne`.
305+
(decl gen_addrg32bne (G32Bne) AddrG32Bne)
306+
(rule (gen_addrg32bne (G32Bne.All base heap_bound_load wasm offset))
307+
(gen_addrg32bne_for_real base heap_bound_load wasm offset))
308+
(convert G32Bne AddrG32Bne gen_addrg32bne)
309+
310+
;; Small workaround to pattern-match `SunkLoad` here.
311+
(decl gen_addrg32bne_for_real (XReg SunkLoad XReg u8) AddrG32Bne)
312+
(rule (gen_addrg32bne_for_real base (SunkLoad.Load _ bound_addr bound_offset) wasm_addr offset)
313+
(AddrG32Bne.BoundNe base bound_addr bound_offset wasm_addr offset))
314+
315+
;; Helper to extract `G32Bne` from `G32` if applicable.
316+
;;
317+
;; This is possible when the heap bound is itself a sinkable load which can get
318+
;; folded into `G32Bne`.
319+
(decl pure partial addrg32bne (G32) G32Bne)
320+
(rule (addrg32bne (G32.All heap_base heap_bound wasm_addr offset))
321+
(if-let heap_bound_load (sinkable_load heap_bound))
322+
(if-let offset8 (u8_try_from_u16 offset))
323+
(G32Bne.All heap_base heap_bound_load wasm_addr offset8))
324+
325+
;; Main entrypoint for matching a `G32` address which can be used in `*_g32`
326+
;; pulley instructions for loads/stores.
327+
;;
328+
;; Pulley loads/stores are emitted as fallible loads/stores where the only
329+
;; defined trapping address is null. That's modeled as a load-from-`select`
330+
;; where an out-of-bounds condition yields null.
331+
;;
332+
;; Here the top-layer of this rule matches the `(select ...)` node where 0 is
333+
;; used if the oob condition is true. Otherwise the raw address is used for the
334+
;; load/store.
335+
(decl pure partial wasm_g32 (Value Offset32 MemFlags Type) G32)
336+
(rule (wasm_g32 (select oob (u64_from_iconst 0) raw_addr) (offset32 0) flags ty)
337+
;; This must be a wasm load/store according to `MemFlags`, namely that it's
338+
;; got a "HEAP_OUT_OF_BOUNDS" trap code and it's little-endian.
339+
(if-let true (memflags_is_wasm flags))
340+
;; Peel off the static wasm offset from `raw_addr`, if one is present. If
341+
;; one isn't present then `heap_offset` will be zero. This handles the `N`
342+
;; in wasm instructions `i32.load offset=N`.
343+
(if-let (HostOffset.All host_addr heap_offset) (host_offset raw_addr))
344+
;; Next see if the `oob` and `raw_addr` combination match. This will attempt
345+
;; extract a full bounds check from these values. If everything succeeds the
346+
;; final step is then to extract an 8-bit offset of the load/store operation,
347+
;; if appplicable, assuming that the constants used in various places all line
348+
;; up just right.
349+
(if-let (OobSelect.All base bound wasm_addr access_size_plus_offset)
350+
(wasm_oob_select oob host_addr))
351+
(if-let offset16 (g32_offset heap_offset ty access_size_plus_offset))
352+
(G32.All base bound wasm_addr offset16))
353+
354+
;; Host helper to do the math to extract the offset here.
355+
;;
356+
;; Ensures that the first argument, the heap offset on the load, plus the size
357+
;; of the type equals the third argument, the bounds-checked static offset. If
358+
;; the offset then fits within `u16` it's returned.
359+
(decl pure partial g32_offset (i32 Type u64) u16)
360+
(extern constructor g32_offset g32_offset)
361+
362+
;; Helper used in `wasm_g32` above to extract `(iadd addr N)` where `N` is a
363+
;; constant. If there is no constant then the constant 0 is returned instead.
364+
;;
365+
;; Note that this also requires `addr` itself to be an `iadd` internally to
366+
;; represent the host address plus the guest offset. If `addr` isn't internally
367+
;; an `iadd` then the `N` here is instead probably the static address in the
368+
;; guest.
369+
(decl pure host_offset (Value) HostOffset)
370+
(type HostOffset (enum (All (a Value) (b i32))))
371+
(rule 0 (host_offset val) (HostOffset.All val 0))
372+
(rule 1 (host_offset (iadd val @ (iadd _ _) (i32_from_iconst n))) (HostOffset.All val n))
373+
(rule 2 (host_offset (iadd (i32_from_iconst n) val @ (iadd _ _))) (HostOffset.All val n))
374+
375+
;; Helper to test whether the first argument `oob` contains a condition for
376+
;; matching whether second argument `addr` is out-of-bounds. Searches for a
377+
;; variety of structures that optimizations or the frontend may produce.
378+
(decl pure partial wasm_oob_select (Value Value) OobSelect)
379+
(type OobSelect (enum (All (a Value) (b Value) (c Value) (d u64))))
380+
381+
;; 32-bit hosts: search either side of the `iadd` for the base address within
382+
;; `oob` to see if it's a matching condition.
383+
(rule 0 (wasm_oob_select oob (iadd base wasm_addr @ (value_type $I32)))
384+
(if-let (OobCond.All bound n) (wasm_oob_cond wasm_addr oob))
385+
(if-let (PointerWidth.PointerWidth32) (pointer_width))
386+
(OobSelect.All base bound wasm_addr n))
387+
(rule 1 (wasm_oob_select oob (iadd wasm_addr @ (value_type $I32) base))
388+
(if-let (OobCond.All bound n) (wasm_oob_cond wasm_addr oob))
389+
(if-let (PointerWidth.PointerWidth32) (pointer_width))
390+
(OobSelect.All base bound wasm_addr n))
391+
392+
;; 64-bit hosts: also search either side, but the wasm address must also be
393+
;; a `uextend` from a 32-bit value.
394+
(rule 0 (wasm_oob_select oob (iadd base wasm_addr_ext @ (value_type $I64)))
395+
(if-let (OobCond.All bound n) (wasm_oob_cond wasm_addr_ext oob))
396+
(if-let wasm_addr (wasm32_addr_for_host64 wasm_addr_ext))
397+
(if-let (PointerWidth.PointerWidth64) (pointer_width))
398+
(OobSelect.All base bound wasm_addr n))
399+
(rule 1 (wasm_oob_select oob (iadd wasm_addr_ext @ (value_type $I64) base))
400+
(if-let (OobCond.All bound n) (wasm_oob_cond wasm_addr_ext oob))
401+
(if-let wasm_addr (wasm32_addr_for_host64 wasm_addr_ext))
402+
(if-let (PointerWidth.PointerWidth64) (pointer_width))
403+
(OobSelect.All base bound wasm_addr n))
404+
405+
;; Helper to extract a 32-bit `Value` from a 64-bit input. The returned value,
406+
;; if one is returned, can losslessly represent the original value when zero
407+
;; extended from 32-bits. That means that this only matches `(uextend ...)` or
408+
;; an `iconst` which already fits in 32 bits.
409+
(decl pure partial wasm32_addr_for_host64 (Value) Value)
410+
(rule (wasm32_addr_for_host64 (uextend addr @ (value_type $I32))) addr)
411+
(rule (wasm32_addr_for_host64 val @ (u32_from_iconst _)) val)
412+
413+
;; Helper to search for the first argument, `wasm_addr`, within the oob
414+
;; condition second argument, `oob`. It should appear as an integer comparison
415+
;; of the address against a particular bound.
416+
(decl pure partial wasm_oob_cond (Value Value) OobCond)
417+
(type OobCond (enum (All (a Value) (b u64))))
418+
(rule (wasm_oob_cond wasm_addr (ugt wasm_addr (isub bound (u64_from_iconst n))))
419+
(OobCond.All bound n))
420+
(rule (wasm_oob_cond wasm_addr (ult (isub bound (u64_from_iconst n)) wasm_addr))
421+
(OobCond.All bound n))
422+
223423
;;;; Newtypes for Different Register Classes ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
224424

225425
(type XReg (primitive XReg))
@@ -467,10 +667,10 @@
467667
(rule (pulley_br_if cond taken not_taken)
468668
(SideEffectNoResult.Inst (MInst.BrIf cond taken not_taken)))
469669

470-
(decl pulley_xload (Amode Type MemFlags ExtKind) XReg)
471-
(rule (pulley_xload amode ty flags ext)
670+
(decl pulley_xload (Amode Type MemFlags) XReg)
671+
(rule (pulley_xload amode ty flags)
472672
(let ((dst WritableXReg (temp_writable_xreg))
473-
(_ Unit (emit (MInst.XLoad dst amode ty flags ext))))
673+
(_ Unit (emit (MInst.XLoad dst amode ty flags))))
474674
dst))
475675

476676
(decl pulley_xstore (Amode XReg Type MemFlags) SideEffectNoResult)
@@ -487,10 +687,10 @@
487687
(rule (pulley_fstore amode src ty flags)
488688
(SideEffectNoResult.Inst (MInst.FStore amode src ty flags)))
489689

490-
(decl pulley_vload (Amode Type MemFlags VExtKind) VReg)
491-
(rule (pulley_vload amode ty flags ext)
690+
(decl pulley_vload (Amode Type MemFlags) VReg)
691+
(rule (pulley_vload amode ty flags)
492692
(let ((dst WritableVReg (temp_writable_vreg))
493-
(_ Unit (emit (MInst.VLoad dst amode ty flags ext))))
693+
(_ Unit (emit (MInst.VLoad dst amode ty flags))))
494694
dst))
495695

496696
(decl pulley_vstore (Amode VReg Type MemFlags) SideEffectNoResult)

0 commit comments

Comments
 (0)