Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: IR Reference example triggers debug assert on x86_64 #10118

Open
iwanders opened this issue Jan 27, 2025 · 9 comments · May be fixed by #10209
Open

Cranelift: IR Reference example triggers debug assert on x86_64 #10118

iwanders opened this issue Jan 27, 2025 · 9 comments · May be fixed by #10209
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator

Comments

@iwanders
Copy link
Contributor

.clif Test Case

The Cranelift IR average example code from the docs/ir.md. I've already added that to the filetests test directory in caeafe2.

Using clif-util;

../target/debug/clif-util  bugpoint  filetests/filetests/isa/x64/average.clif x86_64
After pass 0, remaining insts/blocks: 22/3 (will keep reducing)
After pass 1, remaining insts/blocks: 13/3 (will keep reducing)
After pass 2, remaining insts/blocks: 13/3 (stop reducing)
████████████████████████████████████████████████████████████ pass 2 phase merge blocks                   2/   2 Remove unused global values
Crash message: assertion `left == right` failed
  left: types::I32
 right: types::I64

function %average() -> f32 system_v {
    ss0 = explicit_slot 8

block1:
    v0 = iconst.i32 0
    v1 = iconst.i32 0
    v5 = iconst.i32 0
    v6 = iadd v0, v5  ; v0 = 0, v5 = 0
    v9 = f64const 0.0
    v100 = f32const +NaN
    brif v1, block2, block5  ; v1 = 0

block2:
    v7 = load.f32 v6
    v8 = fpromote.f64 v7
    v10 = fadd v8, v9  ; v9 = 0.0
    stack_store v10, ss0
    trap user1

block5:
    return v100  ; v100 = +NaN
}

5 blocks 22 insts -> 3 blocks 13 insts

Manually, I reduced it to the following:

function %average(i32, i32) -> f32 system_v {
    ss0 = explicit_slot 8        ; Stack slot for `sum`

block1(v0: i32, v1: i32):
    v20 = f64const 0x0.0         ; Create a f64 as 0.0, just to initialise ss0.
    stack_store v20, ss0         ; Store f64 into the ss0 stack slot, just to initialise ss0.
    v6 = iadd v0, v1             ; Adds the input together as integers, output is i32?
    v7 = load.f32 v6             ; Converts v7 to f32 from i32.
    v8 = fpromote.f64 v7         ; converts v7 to f64 into v8.
    v9 = stack_load.f64 ss0      ; Loads ss0 from the stack into v9, v9 is now f64 that's 0.0
    v10 = fadd.f64 v8, v9        ; Adds v8 and v9 together, both are f64's, so v10 is an f64.
    stack_store v10, ss0         ; Write the 8 bytes back to ss0.
    v100 = f32const +NaN         ; Just creating a dummy to return.
    return v100                  ; Return the dummy.
}

Instructions themselves seem fine to me, but today is the first day I'm looking at Cranelift at all, so I absolutely may be missing something obvious.

Steps to Reproduce

  • Cherrypick caeafe2
  • cd cranelift
  • cargo t (note without --release, test will pass with --release).

Note, the problem is with x86_64, this function compiles fine with aarch64.

Expected Results

I would expect the example function provided in the documentation to compile, it compiles on aarch64, which (to me at least) indicates the function is fine. I would expect this to also compile on x86_64.

Actual Results

A debug assert triggers on this line;

Backtrace
---- filetests stdout ----
thread 'worker #7' panicked at cranelift/codegen/src/isa/x64/lower.rs:233:9:
assertion `left == right` failed
  left: types::I32
 right: types::I64
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
             at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:373:5
   4: cranelift_codegen::isa::x64::lower::lower_to_amode
             at ./codegen/src/isa/x64/lower.rs:233:9
   5: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::sink_load
             at ./codegen/src/isa/x64/lower/isle.rs:313:20
   6: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::put_in_reg_mem
             at ./codegen/src/isa/x64/lower/isle.rs:148:23
   7: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::put_in_xmm_mem
             at ./codegen/src/isa/x64/lower/isle.rs:132:28
   8: cranelift_codegen::isa::x64::lower::isle::generated_code::constructor_lower
             at /home/ivor/Documents/Code/rust/wasmtime/wasmtime/target/debug/build/cranelift-codegen-8f3d42bba10fabf3/out/isle_x64.rs:21098:42
   9: cranelift_codegen::isa::x64::lower::isle::lower
             at ./codegen/src/isa/x64/lower/isle.rs:55:5
  10: cranelift_codegen::isa::x64::lower::<impl cranelift_codegen::machinst::lower::LowerBackend for cranelift_codegen::isa::x64::X64Backend>::lower
             at ./codegen/src/isa/x64/lower.rs:311:9
  11: cranelift_codegen::machinst::lower::Lower<I>::lower_clif_block
             at ./codegen/src/machinst/lower.rs:679:39
  12: cranelift_codegen::machinst::lower::Lower<I>::lower
             at ./codegen/src/machinst/lower.rs:1020:17
  13: cranelift_codegen::machinst::compile::compile
             at ./codegen/src/machinst/compile.rs:42:9
  14: cranelift_codegen::isa::x64::X64Backend::compile_vcode
             at ./codegen/src/isa/x64/mod.rs:62:9
  15: <cranelift_codegen::isa::x64::X64Backend as cranelift_codegen::isa::TargetIsa>::compile_function
             at ./codegen/src/isa/x64/mod.rs:74:40
  16: cranelift_codegen::context::Context::compile_stencil
             at ./codegen/src/context.rs:138:9
  17: cranelift_codegen::context::Context::compile
             at ./codegen/src/context.rs:204:23
  18: <cranelift_filetests::test_compile::TestCompile as cranelift_filetests::subtest::SubTest>::run
             at ./filetests/src/test_compile.rs:59:29
  19: cranelift_filetests::subtest::SubTest::run_target
             at ./filetests/src/subtest.rs:101:13
  20: cranelift_filetests::runone::run
             at ./filetests/src/runone.rs:97:9
  21: cranelift_filetests::concurrent::worker_thread::{{closure}}::{{closure}}
             at ./filetests/src/concurrent.rs:149:46
  22: std::panicking::try::do_call
             at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40
  23: __rust_try
  24: std::panicking::try
             at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19
  25: std::panic::catch_unwind
             at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:358:14
  26: cranelift_filetests::concurrent::worker_thread::{{closure}}
             at ./filetests/src/concurrent.rs:149:30
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

The relevant section from isle_x64.rs;

&Opcode::Fpromote => {
    let v1 = C::first_result(ctx, arg0);
    if let Some(v2) = v1 {
        let v3 = C::value_type(ctx, v2);
        if v3 == F64 {
            let v1809 = constructor_xmm_zero(ctx, F64X2);
            let v1804 = &C::put_in_xmm_mem(ctx, v520);             // <- Line isle_x64.rs:21098:42
            let v1819 = constructor_x64_cvtss2sd(ctx, v1809, v1804);
            let v1820 = constructor_output_xmm(ctx, v1819);
            let v1821 = Some(v1820);
            // Rule at src/isa/x64/lower.isle line 2661.
            return v1821;
        }
    }
}

Which seems to indicate the problem is originating from Fpromote.

Versions and Environment

Cranelift version or commit: Current released; 0.116.1, also tested on current main; 5dfccc0.

Operating system: Ubuntu, stable rustc 1.84.0 (9fc6b4312 2025-01-07)

Architecture: x86_64

Extra Info

If I use a release build, target x86_64 and write the output to an object file using cranelift_object::ObjectModule. The provided symbol does work and correctly calculates averages.

Disassembly of object and usage of it to calculate average
$ objdump -d average.o

average.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <average>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	48 83 ec 10          	sub    $0x10,%rsp
   8:	66 0f 57 e4          	xorpd  %xmm4,%xmm4
   c:	48 8d 04 24          	lea    (%rsp),%rax
  10:	f2 0f 11 20          	movsd  %xmm4,(%rax)
  14:	85 f6                	test   %esi,%esi
  16:	0f 85 12 00 00 00    	jne    2e <average+0x2e>
  1c:	b9 00 00 c0 7f       	mov    $0x7fc00000,%ecx
  21:	66 0f 6e c1          	movd   %ecx,%xmm0
  25:	48 83 c4 10          	add    $0x10,%rsp
  29:	48 89 ec             	mov    %rbp,%rsp
  2c:	5d                   	pop    %rbp
  2d:	c3                   	retq   
  2e:	31 c9                	xor    %ecx,%ecx
  30:	44 6b c9 04          	imul   $0x4,%ecx,%r9d
  34:	66 0f 57 ed          	xorpd  %xmm5,%xmm5
  38:	f3 42 0f 5a 2c 0f    	cvtss2sd (%rdi,%r9,1),%xmm5
  3e:	4c 8d 0c 24          	lea    (%rsp),%r9
  42:	f2 41 0f 58 29       	addsd  (%r9),%xmm5
  47:	4c 8d 0c 24          	lea    (%rsp),%r9
  4b:	f2 41 0f 11 29       	movsd  %xmm5,(%r9)
  50:	83 c1 01             	add    $0x1,%ecx
  53:	39 f1                	cmp    %esi,%ecx
  55:	0f 82 d5 ff ff ff    	jb     30 <average+0x30>
  5b:	48 8d 3c 24          	lea    (%rsp),%rdi
  5f:	f2 0f 10 3f          	movsd  (%rdi),%xmm7
  63:	66 0f 57 e4          	xorpd  %xmm4,%xmm4
  67:	8b fe                	mov    %esi,%edi
  69:	f2 48 0f 2a e7       	cvtsi2sd %rdi,%xmm4
  6e:	f2 0f 5e fc          	divsd  %xmm4,%xmm7
  72:	0f 57 c0             	xorps  %xmm0,%xmm0
  75:	f2 0f 5a c7          	cvtsd2ss %xmm7,%xmm0
  79:	48 83 c4 10          	add    $0x10,%rsp
  7d:	48 89 ec             	mov    %rbp,%rsp
  80:	5d                   	pop    %rbp
  81:	c3                   	retq  
$ cat main.cpp 
#include <array>
#include <iostream>
extern "C" {
  float average(const float *array, size_t count);
}
int main(int, char**) {
  const auto values = std::array<float, 4>{1, 5555, 3, 4};
  const auto avg = average(values.data(), values.size());
  std::cout << "avg: " << avg << std::endl;
  return 0;
}
$ g++ main.cpp  average.o && ./a.out 
avg: 1390.75
@iwanders iwanders added bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator labels Jan 27, 2025
@cfallin
Copy link
Member

cfallin commented Jan 27, 2025

In this snippet

block1(v0: i32, v1: i32):
   v20 = f64const 0x0.0         ; Create a f64 as 0.0, just to initialise ss0.
   stack_store v20, ss0         ; Store f64 into the ss0 stack slot, just to initialise ss0.
   v6 = iadd v0, v1             ; Adds the input together as integers, output is i32?
   v7 = load.f32 v6             ; Converts v7 to f32 from i32.

you are loading from memory at address v6, but v6 is an I32-typed value (iadd is polymorphic so the type is implicit from v0 and v1 args, which are both i32). Note that x86-64 is a 64-bit architecture, so we require addresses to be 64 bits wide. That this worked on aarch64 is a little surprising but I guess the lowering rules are more lenient in that backend (they probably auto-extend).

(The comments aren't load-bearing for the bug but I note too that you wrote "Converts ... to f32 from i32" which is inconsistent with a load -- usually I would reach for an int-to-float opcode to convert in the usual sense.)

Our assertion failure message could absolutely be better here -- perhaps we could add "address with unexpected" or something like that.

I'll note as well that the example IR you took from ir.md is written without a target in mind, and address width is one of the few ways in which CLIF is not totally target-independent -- it must match the target ISA. I suspect that example was written long ago when riscv32 was the original target of proto-Cranelift.

Hopefully this is a simple fix (use i64 types for addresses) -- let us know if that addresses your problem!

@iwanders
Copy link
Contributor Author

@cfallin , thank you for the detailed answer.

Note that x86-64 is a 64-bit architecture, so we require addresses to be 64 bits wide.
Hopefully this is a simple fix (use i64 types for addresses) -- let us know if that addresses your problem!

Ah, I didn't think of that, but that does make sense!

I can indeed get the example from the documentation to work by changing v0, v1, v3 and v4 to i64. Would you like me to file a PR to change the documentation to use i64s for those variables? I expect most people who try to read through the docs and build something to use the cranelift_native crate and be on 64 bit architectures?

Our assertion failure message could absolutely be better here -- perhaps we could add "address with unexpected" or something like that.

Yeah, I think just something simple as changing that debug assert to be;

        debug_assert_eq!(ctx.output_ty(add, 0), types::I64, "address width of 64 expected, got 32");

would go a long way towards steering people in the right direction? I can file a pr with this change if you'd like me to. The discrepancy between debug and release builds is also a bit surprising to me, but that may be less of an easy fix as it probably requires adding an explicit check.

@cfallin
Copy link
Member

cfallin commented Jan 27, 2025

Yes, if you're willing, a PR to make both of those changes would be very much appreciated! Happy to review. Thanks!

@bjorn3
Copy link
Contributor

bjorn3 commented Jan 27, 2025

The discrepancy between debug and release builds is also a bit surprising to me, but that may be less of an easy fix as it probably requires adding an explicit check.

Does the clif ir verifier catch this mistake? You need to explicitly enable it as it is kinda slow.

@iwanders
Copy link
Contributor Author

Does the clif ir verifier catch this mistake? You need to explicitly enable it as it is kinda slow.

I tried adding the same average.clif file with an test verifier stanza at the top in e132b3b, this does not cause a unit test failure, so I don't think it does. I could not find a way to specify a test verifier_fail or something to indicate the verifier should report a failure in the test case, so I'm not sure how to really express this as a test.

My exact setup is available in 70af5ce on (branch issue-10118-reproduction), it adds a crate cranelift/compile_average_ir in the wasmtime repo for easy reproduction (function attempt_two in main.rs). This prints the flags as:

enable_alias_analysis = true
enable_verifier = true
enable_pcc = false

Which I take to mean the verifier is enabled, the documentation here also states it is enabled by default? Reproduction should be a matter of going to the crate and doing cargo r, that gives the assert, while cargo r --release shows the object file bytes.

Context; I'm considering creating a toy compiler, just to learn more about compilers. On Sunday I was exploring Cranelift to see how approachable it is and whether I'd like to use it as the frontend and share datastructures (effectively I think I would attempt making a TargetIsa). So that's why I ended up reading the IR documentation and tried to compile that function from the documentation. Overall though, the fact that I got working assembly within an hour or two of reading, experimenting, and in like 100 lines, makes it very approachable 👍

Yes, if you're willing, a PR to make both of those changes would be very much appreciated! Happy to review. Thanks!

Cool, I'll put something together.

iwanders added a commit to iwanders/wasmtime that referenced this issue Jan 28, 2025
…e#10118)

This adds additional information as to what is wrong when this assert is encountered.
iwanders added a commit to iwanders/wasmtime that referenced this issue Jan 28, 2025
…lliance#10118)

Most people using this example will likely be on 64 bit systems, so it makes
sense to target a system with 64 bit addressses with the example.
github-merge-queue bot pushed a commit that referenced this issue Jan 28, 2025
* Cranelift/x64 backend: improve pointer width assert. (#10118)

This adds additional information as to what is wrong when this assert is encountered.

* Cranelift/docs: Change pointer width in ir docs to 64 bits (#10118)

Most people using this example will likely be on 64 bit systems, so it makes
sense to target a system with 64 bit addressses with the example.
iwanders added a commit to iwanders/wasmtime that referenced this issue Feb 2, 2025
…e#10118)

This adds additional information as to what is wrong when this assert is encountered.
iwanders added a commit to iwanders/wasmtime that referenced this issue Feb 2, 2025
…lliance#10118)

Most people using this example will likely be on 64 bit systems, so it makes
sense to target a system with 64 bit addressses with the example.
iwanders added a commit to iwanders/wasmtime that referenced this issue Feb 3, 2025
…dealliance#10118)

This adds a check to load that confirms the pointer width is as expected
according to the target.
@iwanders
Copy link
Contributor Author

iwanders commented Feb 3, 2025

I took another look at this, now that I have a bit more understanding of the cranelift data structures.

In efd56cb of the issue-10118-reproduction branch, I added some printing of the types that we get in the clif:

This indeed shows the problem with the load on I32.

inst: inst2
  instruction_data: Binary { opcode: Iadd, args: [v0, v1] }
  typevar_operand: Some(v0)
  opcode: Iadd
  args: [v0, v1] types: [types::I32, types::I32]
  results? true -> [v6]  (types: [types::I32]) 
inst: inst3
  instruction_data: Load { opcode: Load, arg: v6, flags: MemFlags { bits: 32384 }, offset: Offset32(0) }
  typevar_operand: Some(v6)
  opcode: Load
  args: [v6] types: [types::I32]
  results? true -> [v7]  (types: [types::F32]) 

So at first I looked at the isle files and tried to figure out if there needed to be a type constraint on the argument to prevent it from being used... but I think I was looking in the wrong place.

Next I looked at the verifier... so in instructions.rs the iAddr is defined, but this is just an int of 32 or 64 bits.

In d7e53ea I added some prints to the verifier, and this does indeed show that the int-set for the Value that's the argument has the 32 and 64 bit integers set;

typecheck: inst3   Load { opcode: Load, arg: v6, flags: MemFlags { bits: 32384 }, offset: Offset32(0) }
  ctrl: types::F32
   arg_type: types::I32
   type_set: ValueTypeSet { lanes: cranelift_bitset::scalar::ScalarBitSet<u16> { 0: true, 1: false, 2: false, 3: false, 4: false, 5: false, 6: false, 7: false, 8: false, 9: false, 10: false, 11: false, 12: false, 13: false, 14: false, 15: false }, ints: cranelift_bitset::scalar::ScalarBitSet<u8> { 0: false, 1: false, 2: false, 3: false, 4: false, 5: true, 6: true, 7: false }, floats: cranelift_bitset::scalar::ScalarBitSet<u8> { 0: false, 1: false, 2: false, 3: false, 4: false, 5: false, 6: false, 7: false }, dynamic_lanes: cranelift_bitset::scalar::ScalarBitSet<u16> { 0: false, 1: false, 2: false, 3: false, 4: false, 5: false, 6: false, 7: false, 8: false, 9: false, 10: false, 11: false, 12: false, 13: false, 14: false, 15: false } }
   arg_type: types::I32

So I think this means that even though we are on a 64 bit architecture, the load can take a 32 or 64 bit argument according to the verifier.

I'm not sure what the best approach is here, by the time we reach the verifier, we cannot distinguish between an i32, i64 and iAddr anymore, given my (limited) understanding the ideal would be to use a bit in the ValueTypeSet to denote something as an address integer, and then the verifier can confirm that that the value is of the appropriate type for the current isa.

In c75db15 I just added a verify_is_address method to the verifier, and call for load, that makes the verifier complain with this bad function;

Defining function
er: VerifierError { location: inst3, context: Some("v7 = load.f32 v6"), message: "invalid pointer width (got 32, expected 64) encountered v6" }
Error: "Compilation error: Verifier errors"

Unfortunately, that commit does not run the test suite cleanly, there are 5 tests that fail:

Test failures with pointer width checking
running 1 test
test filetests ... FAILED

failures:

---- filetests stdout ----
FAIL filetests/filetests/runtests/simd-extractlane.clif: function %extractlane_i8x16_through_stack(i8x16) -> i8 system_v {
    ss0 = explicit_slot 8

block0(v0: i8x16):
    v2 = stack_addr.i64 ss0
    v3 = extractlane v0, 1
    store v3, v2
    v4 = load.i8 v2
;   ^~~~~~~~~~~~~~~
; error: inst3 (v4 = load.i8 v2): invalid pointer width (got 64, expected 32) encountered v2

    return v4
}

; 1 verifier error detected (see above). Compilation aborted.


Stack backtrace:
   <snip>
FAIL filetests/filetests/runtests/fdemote.clif: function %fdemote_load(i64, f64) -> f32 system_v {
    ss0 = explicit_slot 16

block0(v1: i64, v2: f64):
    v3 = stack_addr.i64 ss0
    store v2, v3
    v4 = load.f64 v3
;   ^~~~~~~~~~~~~~~~
; error: inst2 (v4 = load.f64 v3): invalid pointer width (got 64, expected 32) encountered v3

    v5 = fdemote.f32 v4
    return v5
}

; 1 verifier error detected (see above). Compilation aborted.


Stack backtrace:
   <snip>

FAIL filetests/filetests/runtests/simd-insertlane.clif: function %insertlane_i8x16_through_stack(i8x16, i8) -> i8x16 system_v {
    ss0 = explicit_slot 8

block0(v0: i8x16, v1: i8):
    v2 = stack_addr.i64 ss0
    store v1, v2
    v3 = load.i8 v2
;   ^~~~~~~~~~~~~~~
; error: inst2 (v3 = load.i8 v2): invalid pointer width (got 64, expected 32) encountered v2

    v4 = insertlane v0, v3, 1
    return v4
}

; 1 verifier error detected (see above). Compilation aborted.


Stack backtrace:
   <snip>
FAIL filetests/filetests/runtests/fpromote.clif: function %fpromote_load(i64, f32) -> f64 system_v {
    ss0 = explicit_slot 16

block0(v1: i64, v2: f32):
    v3 = stack_addr.i64 ss0
    store v2, v3
    v4 = load.f32 v3
;   ^~~~~~~~~~~~~~~~
; error: inst2 (v4 = load.f32 v3): invalid pointer width (got 64, expected 32) encountered v3

    v5 = fpromote.f64 v4
    return v5
}

; 1 verifier error detected (see above). Compilation aborted.


Stack backtrace:
   <snip>
FAIL filetests/filetests/isa/x64/average.clif: function %average(i32, i32) -> f32 system_v {
    ss0 = explicit_slot 8

block1(v0: i32, v1: i32):
    v2 = f64const 0.0
    stack_store v2, ss0  ; v2 = 0.0
    brif v1, block2, block5

block2:
    v3 = iconst.i32 0
    jump block3(v3)  ; v3 = 0

block3(v4: i32):
    v5 = imul_imm v4, 4
    v6 = iadd.i32 v0, v5
    v7 = load.f32 v6
;   ^~~~~~~~~~~~~~~~
; error: inst7 (v7 = load.f32 v6): invalid pointer width (got 32, expected 64) encountered v6

    v8 = fpromote.f64 v7
    v9 = stack_load.f64 ss0
    v10 = fadd v8, v9
    stack_store v10, ss0
    v11 = iadd_imm v4, 1
    v12 = icmp ult v11, v1
    brif v12, block3(v11), block4

block4:
    v13 = stack_load.f64 ss0
    v14 = fcvt_from_uint.f64 v1
    v15 = fdiv v13, v14
    v16 = fdemote.f32 v15
    return v16

block5:
    v100 = f32const +NaN
    return v100  ; v100 = +NaN
}

; 1 verifier error detected (see above). Compilation aborted.


Stack backtrace:
   <snip>
1065 tests
Error: 5 failures

Stack backtrace:
   <snip>


failures:
    filetests

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.82s

Problem here is, that that simd-extractlane.clif test file has pulley32, which probably creates an isa that uses 32 bits pointer widths, but then line 49 presumably gets a 64 bit stack address, which is then passed to the load on line 52, where the verifier now raises that this is incorrect.

I can't really see a good solution for this that does not involve either introducing an address type, or having to split the clif files into 32 and 64 bit tests.

@cfallin
Copy link
Member

cfallin commented Feb 3, 2025

Problem here is, that that simd-extractlane.clif test file has pulley32, which probably creates an isa that uses 32 bits pointer widths, but then line 49 presumably gets a 64 bit stack address, which is then passed to the load on line 52, where the verifier now raises that this is incorrect.

I can't really see a good solution for this that does not involve either introducing an address type, or having to split the clif files into 32 and 64 bit tests.

It's fine in this case to remove pulley32 from the test, I think -- given that the CLIF needs to be platform-pointer-width-aware, any fixed handwritten CLIF will necessarily be tied to one pointer width. For tests where loads and stores are an important part of the tested lowerings, we can split out 32-bit versions. And we should try to write tests in ways that minimize uses of loads and stores except where that is an essential part of the tested lowering (in this case it is, since we have extract-lane-from-memory lowerings on x86-64 at least).

@iwanders
Copy link
Contributor Author

iwanders commented Feb 5, 2025

Okay, interesting. Part of me was even wondering why it would be illegal to do a load from an i32, given that it's just pointing to a memory address. The docs for it state An integer address type, I guess the address type also implies width must be equal to the pointer width?

@cfallin , given that you stated it would be okay to split out the clif files (where necessary) to 32 and 64 bits... should we consider doing that in combination with the (kind of hardcoded) improvement to the verifier I did in c75db15? I can probably find some time over the weekend to explore how that solution would end up looking. We may want to extend it to also check store to keep it symmetrical?

@cfallin
Copy link
Member

cfallin commented Feb 7, 2025

Yes, updating the verifier and then updating tests that it finds violate the constraint is the right path here, I think (and for stores as well as loads; and make sure all the SIMD variants and atomics are covered, too). Thanks!

The docs for it state An integer address type, I guess the address type also implies width must be equal to the pointer width?

Yes, the wording is somewhat ambiguous in that method documentation, but we've explicitly decided before (when adding the assert that you're hitting) that we define correct IR to use platform-width pointers only.

iwanders added a commit to iwanders/wasmtime that referenced this issue Feb 8, 2025
…dealliance#10118)

This adds a check to load that confirms the pointer width is as expected
according to the target.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants