Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asm: pretty-printing signed immediates #10200

Open
abrown opened this issue Feb 6, 2025 · 1 comment · May be fixed by #10216
Open

asm: pretty-printing signed immediates #10200

abrown opened this issue Feb 6, 2025 · 1 comment · May be fixed by #10216

Comments

@abrown
Copy link
Contributor

abrown commented Feb 6, 2025

This issue outlines two problems I encountered adding new assembler instructions:

  1. to match capstone's pretty-printing, we must distinguish between signed and unsigned immediates, both of which can be sign-extended (!)
  2. to avoid a semantic mismatch at the ISLE level, the assembler must clearly differentiate between signed and unsigned immediates with the same representation (@alexcrichton suggested using different types).

Taken together, these two problems make it difficult to find a solution that satisfies both requirements. Let me explain: capstone pretty-prints immediates differently per instruction. The x64 add and and groups both have instructions that sign-extend a 32-bit immediate into a 64-bit one before the operation. The add output prints like a signed integer, but the and prints like an unsigned integer:

let add = inst::addq_i_sxl::new(Imm32::new(0xd7f247b5));
println!("{add}");
> addq $-0x280db84b, %rax

let and = inst::andq_i_sxl::new(Imm32::new(0xd7f247b5));
println!("{and}");
> andq $0xffffffffd7f247b5, %rax

This is probably due to capstone understanding that add is arithmetic and and is logical — makes sense, right? One solution to properly match what capstone prints is to add a new simm* form to the DSL: for sign-extending instructions, add would get the simm* form and print the signed integer ($-0x...), and would get the current imm* form and print the unsigned integer ($0xffff...)... just extended to the right width. (There are other solutions here, like switching to XED which prints both forms as unsigned integers, but we may not be ready for that just yet).

But what about problem 2? @alexcrichton was concerned that if we don't differentiate the immediate type that the assembly instruction takes, we could try to pass in bit-equivalent values to these sign-extending instructions but then have unexpected effects when they are sign-extended; e.g., we pass in 254u8 to one of these instructions but it gets treated as -2i8 and sign-extended to -2i64. We added this comment to track this:

// TODO fix down-convert logic: if an assembly instruction can
// only fit 8 bits, we check if down-converting the 32 bits from
// CLIF we have here will fit. Some assembler instructions will
// sign-extend this immediate, however, and we don't have a way
// to distinguish this yet. For a CLIF value `-2i8`, this
// conversion will both pass on the appropriate bytes and the
// emitted instruction will sign-extend them as expected. But,
// for a CLIF value `254u8` (the same bit pattern), we could
// pass on the right bits but the sign-extension will break
// Cranelift's semantics. For the time being, we conservatively
// only allow down-converting to `i8` values, meaning some valid
// constants will be rejected.
let imm = i8::try_from(simm32).ok()?;
Some(AssemblerImm8::new(imm as u8))

Problem 1 and problem 2 interfere: if we choose to represent the add operand with simm* as suggested above, the instruction can accept a new Simm* type at the CLIF level that makes it clear that we accept a signed integer and that this will be sign-extended — all is well. But, the and operand would still be imm*, accepting an Imm* type, and still confusing the user at the CLIF level, as @alexcrichton was worried would happen. There are several solutions here, but none that I really like, so I'll just describe the problem for now.

@alexcrichton
Copy link
Member

Personally I feel like we should prioritize the representation of the types of the immediates to ensure it minimizes errors and is easy to use. Matching capstone exactly seems like something where we might want to instead engineer the test suite/fuzzing to remove that necessity.

One possible option with that is to rework tests to (a) generate an arbitrary Inst, (b) convert Inst to binary, (c) print the Inst and use a different assembler to convert to binary (maybe llvm-as? maybe just as?), and finally (d) assert the binary is the same. That means that our exact printed format won't necessarily be the same as any other tool, but it does mean that what we print is accepted by a tool.

abrown added a commit to abrown/wasmtime that referenced this issue Feb 10, 2025
As mentioned in bytecodealliance#10200, `capstone` has a peculiar way of pretty-printing
immediates, especially signed immediates. It is simpler (and perhaps
more clear) for us to just print immediates in one consistent format:
`0xffff...`, e.g. This change parses capstone's pretty-printed
immediates and converts them to our simpler format if the first attempt
to match this assembler's output with `capston` fails.
abrown added a commit to abrown/wasmtime that referenced this issue Feb 10, 2025
As pointed out in bytecodealliance#10200, it could be confusing for users for
`cranelift-assembler-x64` to pass unsigned integers to certain assembler
instructions and have them unexpectedly sign-extended. Well, it can't be
too surprising since these instructions have a `_sx*` suffix, but this
change implements @alexcrichton's additional suggestion to create
separate types for the immediates that may be sign-extended.

These new types (`Simm8`, `Simm16`, `Simm32`) are quite similar to their
vanilla counterparts (`Imm8`, `Imm16`, `Imm32`) but have additional
sign-extension logic when pretty-printed. This means the vanilla
versions can be simplified and the pre-existing `Simm32` is renamed to
the more appropriate `AmodeOffset`.
abrown added a commit to abrown/wasmtime that referenced this issue Feb 10, 2025
As mentioned in bytecodealliance#10200, `capstone` has a peculiar way of pretty-printing
immediates, especially signed immediates. It is simpler (and perhaps
more clear) for us to just print immediates in one consistent format:
`0xffff...`, e.g. This change parses capstone's pretty-printed
immediates and converts them to our simpler format if the first attempt
to match this assembler's output with `capston` fails.
abrown added a commit to abrown/wasmtime that referenced this issue Feb 10, 2025
As pointed out in bytecodealliance#10200, it could be confusing for users for
`cranelift-assembler-x64` to pass unsigned integers to certain assembler
instructions and have them unexpectedly sign-extended. Well, it can't be
too surprising since these instructions have a `_sx*` suffix, but this
change implements @alexcrichton's additional suggestion to create
separate types for the immediates that may be sign-extended.

These new types (`Simm8`, `Simm16`, `Simm32`) are quite similar to their
vanilla counterparts (`Imm8`, `Imm16`, `Imm32`) but have additional
sign-extension logic when pretty-printed. This means the vanilla
versions can be simplified and the pre-existing `Simm32` is renamed to
the more appropriate `AmodeOffset`.
abrown added a commit to abrown/wasmtime that referenced this issue Feb 10, 2025
As mentioned in bytecodealliance#10200, `capstone` has a peculiar way of pretty-printing
immediates, especially signed immediates. It is simpler (and perhaps
more clear) for us to just print immediates in one consistent format:
`0xffff...`, e.g. This change parses capstone's pretty-printed
immediates and converts them to our simpler format if the first attempt
to match this assembler's output with `capston` fails.
abrown added a commit to abrown/wasmtime that referenced this issue Feb 10, 2025
As pointed out in bytecodealliance#10200, it could be confusing for users for
`cranelift-assembler-x64` to pass unsigned integers to certain assembler
instructions and have them unexpectedly sign-extended. Well, it can't be
too surprising since these instructions have a `_sx*` suffix, but this
change implements @alexcrichton's additional suggestion to create
separate types for the immediates that may be sign-extended.

These new types (`Simm8`, `Simm16`, `Simm32`) are quite similar to their
vanilla counterparts (`Imm8`, `Imm16`, `Imm32`) but have additional
sign-extension logic when pretty-printed. This means the vanilla
versions can be simplified and the pre-existing `Simm32` is renamed to
the more appropriate `AmodeOffset`.
@abrown abrown linked a pull request Feb 10, 2025 that will close this issue
abrown added a commit to abrown/wasmtime that referenced this issue Feb 10, 2025
As mentioned in bytecodealliance#10200, `capstone` has a peculiar way of pretty-printing
immediates, especially signed immediates. It is simpler (and perhaps
more clear) for us to just print immediates in one consistent format:
`0xffff...`, e.g. This change parses capstone's pretty-printed
immediates and converts them to our simpler format if the first attempt
to match this assembler's output with `capston` fails.
abrown added a commit to abrown/wasmtime that referenced this issue Feb 10, 2025
As pointed out in bytecodealliance#10200, it could be confusing for users for
`cranelift-assembler-x64` to pass unsigned integers to certain assembler
instructions and have them unexpectedly sign-extended. Well, it can't be
too surprising since these instructions have a `_sx*` suffix, but this
change implements @alexcrichton's additional suggestion to create
separate types for the immediates that may be sign-extended.

These new types (`Simm8`, `Simm16`, `Simm32`) are quite similar to their
vanilla counterparts (`Imm8`, `Imm16`, `Imm32`) but have additional
sign-extension logic when pretty-printed. This means the vanilla
versions can be simplified and the pre-existing `Simm32` is renamed to
the more appropriate `AmodeOffset`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants