Follow-up to the fix that routes JIT calls to facet's option/borrow/slice vtable fns through Rust-side trampolines (vox_jit_option_is_some_raw, vox_jit_borrow_raw, vox_jit_slice_len_raw, vox_jit_slice_as_ptr_raw). That fix is the correctness baseline; this issue covers the optimization.
Why the trampolines exist
facet vtable fns take PtrConst, which is #[repr(C)] { ptr: TaggedPtr, metadata: *const () } — a 16-byte struct. The JIT was modeling them as a single ptr_ty AbiParam, which silently mis-matches the C ABI:
- SystemV x86_64: 16-byte struct passes in two registers (RDI+RSI). JIT filled only RDI; metadata register held garbage but was unused for thin pointers — worked by accident.
- Windows x64: structs >8 bytes pass by hidden pointer (RCX =
&PtrConst). JIT put raw data pointer in RCX; callee dereferenced it expecting 16 bytes of PtrConst → access violation.
Same problem applies to vtable fns returning PtrConst (BorrowFn, SliceAsPtrFn): SystemV splits the return across RAX+RDX (JIT read only RAX), Windows uses a caller-supplied hidden out-pointer.
The trampoline approach lets rustc generate the correct ABI handling, at the cost of one extra indirect call per option/borrow/slice op in JIT'd encoders.
What we'd want instead
Model PtrConst directly in the Cranelift signature so the JIT emits the call inline, no trampoline needed.
Cranelift does not have first-class struct types — args have to be manually split into their constituent registers, and platform-conditional encoding is required:
Args (fn(option: PtrConst))
- SystemV:
params: vec![AbiParam::new(ptr_ty), AbiParam::new(ptr_ty)], pass [data_ptr, null].
- Windows x64: caller-allocate
PtrConst on stack via create_sized_stack_slot, store [data_ptr, null] into it, pass &slot as a single ptr_ty arg.
Returns (-> PtrConst — BorrowFn, SliceAsPtrFn)
- SystemV:
returns: vec![AbiParam::new(ptr_ty), AbiParam::new(ptr_ty)], take inst_results(call)[0] as the data pointer (ignore [1]).
- Windows x64: stack-allocate a
PtrConst-sized slot, pass &slot as the hidden first arg, load data_ptr back from slot+0 after the call.
The split would key off the host ISA's call_conv (already accessible via ctx.b.func.signature.call_conv).
Sites
emit_encode_option — OptionIsSomeFn, OptionGetValueFn (arg only — returns are scalar / *const u8)
emit_encode_borrow_pointer — BorrowFn (arg + return)
emit_write_byte_slice — SliceLenFn (arg only), SliceAsPtrFn (arg + return)
Benchmark target
Eliminate one indirect call per option/borrow/slice op in JIT'd encoders. Measure with vox-bench against the gnarly_* workloads, which exercise options heavily.
Cranelift upstream
Worth checking whether cranelift-frontend has, or would accept, a helper that takes a struct layout and synthesizes the multi-register / hidden-pointer dance for the current CallConv — if so we'd want to use it rather than maintaining the platform switch ourselves.
Follow-up to the fix that routes JIT calls to facet's option/borrow/slice vtable fns through Rust-side trampolines (
vox_jit_option_is_some_raw,vox_jit_borrow_raw,vox_jit_slice_len_raw,vox_jit_slice_as_ptr_raw). That fix is the correctness baseline; this issue covers the optimization.Why the trampolines exist
facet vtable fns take
PtrConst, which is#[repr(C)] { ptr: TaggedPtr, metadata: *const () }— a 16-byte struct. The JIT was modeling them as a singleptr_tyAbiParam, which silently mis-matches the C ABI:&PtrConst). JIT put raw data pointer in RCX; callee dereferenced it expecting 16 bytes ofPtrConst→ access violation.Same problem applies to vtable fns returning
PtrConst(BorrowFn,SliceAsPtrFn): SystemV splits the return across RAX+RDX (JIT read only RAX), Windows uses a caller-supplied hidden out-pointer.The trampoline approach lets rustc generate the correct ABI handling, at the cost of one extra indirect call per option/borrow/slice op in JIT'd encoders.
What we'd want instead
Model
PtrConstdirectly in the Cranelift signature so the JIT emits the call inline, no trampoline needed.Cranelift does not have first-class struct types — args have to be manually split into their constituent registers, and platform-conditional encoding is required:
Args (
fn(option: PtrConst))params: vec![AbiParam::new(ptr_ty), AbiParam::new(ptr_ty)], pass[data_ptr, null].PtrConston stack viacreate_sized_stack_slot, store[data_ptr, null]into it, pass&slotas a singleptr_tyarg.Returns (
-> PtrConst—BorrowFn,SliceAsPtrFn)returns: vec![AbiParam::new(ptr_ty), AbiParam::new(ptr_ty)], takeinst_results(call)[0]as the data pointer (ignore [1]).PtrConst-sized slot, pass&slotas the hidden first arg, loaddata_ptrback fromslot+0after the call.The split would key off the host ISA's
call_conv(already accessible viactx.b.func.signature.call_conv).Sites
emit_encode_option—OptionIsSomeFn,OptionGetValueFn(arg only — returns are scalar /*const u8)emit_encode_borrow_pointer—BorrowFn(arg + return)emit_write_byte_slice—SliceLenFn(arg only),SliceAsPtrFn(arg + return)Benchmark target
Eliminate one indirect call per option/borrow/slice op in JIT'd encoders. Measure with
vox-benchagainst thegnarly_*workloads, which exercise options heavily.Cranelift upstream
Worth checking whether cranelift-frontend has, or would accept, a helper that takes a struct layout and synthesizes the multi-register / hidden-pointer dance for the current
CallConv— if so we'd want to use it rather than maintaining the platform switch ourselves.