Skip to content

Conversation

@psiegl
Copy link
Contributor

@psiegl psiegl commented Oct 18, 2025

This branch is used in airbus/a653lib#11

  1. BUGFIX: Create function names {namespace_prefix}_{op}__{struct_name}__{field_name} struct without the empty space and struct, which can't be used as a function.
  2. BUGFIX: For set-functions, perform a *(<type>*)(struct_base_addr + offset) = value instead of *(struct_base_addr + offset) = value. Because the struct_base_addr is of type uint8_t*, thus current setters are solely writing 1 Byte.
  3. PARTIAL BUGFIX: Do not bail out if offset unknown. --target=wasm32-wasi --sysroot=... pulls in time.h, with a struct containing a broken offset:
struct iovec (size: 1 bytes)
  field: iov_base (offset: 18446744073709551615 bits)
  field: iov_len (offset: 18446744073709551615 bits)
  1. FEATURE: Use static inline instead of solely inline.
  2. FEATURE: For struct in struct, provide the address to the nested struct (Example struct: PROCESS_STATUS_TYPE which contains PROCESS_ATTRIBUTE_TYPE ATTRIBUTES as yet another struct). The nested struct can subsequently employed with anyway given get and set.
  3. FEATURE: I noticed, that SAMPLING_PORT_ID_TYPE SAMPLING_PORT_ID is defined as a long thus in theory 4 Byte. While the guest i.e. Wasm32 compiles it to 4 Bytes, the host i.e. x86_64 compiles it to 8 Bytes. As a consequence, when calling extern void GET_SAMPLING_PORT_STATUS ( /*in */ SAMPLING_PORT_ID_TYPE SAMPLING_PORT_ID, /*out*/ SAMPLING_PORT_STATUS_TYPE *SAMPLING_PORT_STATUS, /*out*/ RETURN_CODE_TYPE *RETURN_CODE ) in Wasm32, passes an address to a 4 Byte memory location (SAMPLING_PORT_STATUS_TYPE). However, when naively passing this address to the hosts a653lib, it will write a 8 Byte value to this memory location. Thus overwriting content. Furthermore, a naiiv passthrough does not handle Little-Endian to Big-Endian handling. Thus, all typedefs passed through the ARINC653 function calls must be handled by get/set as well.

Note: Todays clang does not have a proper stdint.h implementation, in case source code is compiled with --target=wasm32-wasi and without --sysroot=.... Consequently, most <u>int<8/16/32/64_t fields fallback to int32_t in Wasm32. In case --sysroot=... is employed, this behaviour can't be seen i.e. stdint.h types have the expected bitwidth. However, char, short, long, ... have always intended bitwidth in case source code is compiled into Wasm32 (again --target=wasm32-wasi). Which means: as long as ARINC653 headers rely on char, short, long, ... one can use inline in a Wasm-Runtime offering ARINC653.

Idea: We could use the https://man7.org/linux/man-pages/man3/endian.3.html instead of #include<byteswap.h>, which simplifies the code significantly as it swaps or not swaps automatically (during compilation it chooses the option). Especially, as the a653lib needs to use such mechanism as well.

@wucke13 wucke13 force-pushed the main branch 2 times, most recently from 39bd757 to b2e3d8b Compare October 21, 2025 11:27
@wucke13
Copy link
Collaborator

wucke13 commented Oct 22, 2025

BUGFIX: Create function names {namespace_prefix}_{op}{struct_name}{field_name} struct without the empty space and struct, which can't be used as a function.

I'm interested in a minimal repro; I believe I fixed it by accident, but I'm unsure. The underlying issue is that I used a wrong libclang method to get the type name, I guess.

BUGFIX: For set-functions, perform a ()(struct_base_addr + offset) = value instead of (struct_base_addr + offset) = value. Because the struct_base_addr is of type uint8_t, thus current setters are solely writing 1 Byte.

The generated code for this was (and still is) broken. See #11 for some notes on that. I'm on fixing it, even removing the UB from potentially unaligned pointer creation :)

PARTIAL BUGFIX: Do not bail out if offset unknown. --target=wasm32-wasi --sysroot=... pulls in time.h, with a struct containing a broken offset:

Agreed! This is already implemented as of yesterday. We gracefully degrade now by ignoring fields or entire struct, yielding an error message but continuing the code-gen for further structs.

FEATURE: Use static inline instead of solely inline.

Agreed! This is already implemented as of yesterday, via letting the user pick any function declaration prefix they like (as in: empty string, inline, static inline, ...)

FEATURE: For struct in struct, provide the address to the nested struct (Example struct: PROCESS_STATUS_TYPE which contains PROCESS_ATTRIBUTE_TYPE ATTRIBUTES as yet another struct). The nested struct can subsequently employed with anyway given get and set.

Agreed! This is already implemented as of yesterday.

FEATURE: I noticed, that SAMPLING_PORT_ID_TYPE SAMPLING_PORT_ID is defined as a long thus in theory 4 Byte. While the guest i.e. Wasm32 compiles it to 4 Bytes, the host i.e. x86_64 compiles it to 8 Bytes. As a consequence, when calling extern void GET_SAMPLING_PORT_STATUS ( /*in */ SAMPLING_PORT_ID_TYPE SAMPLING_PORT_ID, /out/ SAMPLING_PORT_STATUS_TYPE *SAMPLING_PORT_STATUS, /out/ RETURN_CODE_TYPE *RETURN_CODE ) in Wasm32, passes an address to a 4 Byte memory location (SAMPLING_PORT_STATUS_TYPE). However, when naively passing this address to the hosts a653lib, it will write a 8 Byte value to this memory location. Thus overwriting content. Furthermore, a naiiv passthrough does not handle Little-Endian to Big-Endian handling. Thus, all typedefs passed through the ARINC653 function calls must be handled by get/set as well.

That is an interesting problem. Originally the scope of this generator was structs only, but we might be able to create function stubs as well? I have to look into this further.

Note: Todays clang does not have a proper stdint.h implementation, in case source code is compiled with --target=wasm32-wasi and without --sysroot=....

I think this is problem with the distribution of the clang in question. I'd really like to not opt-out of using stdint.h, it is the only standards compliant interface which can even provide a byte type (as in 8 bit integer). Everything else is a target specific hack.

What we maybe could do here however: opt-in macro to generate a bist (built-in self test) function that verifies that sizeof(intN_t) == N, of course for all types that we use.

Idea: We could use the https://man7.org/linux/man-pages/man3/endian.3.html instead of #include<byteswap.h>, which simplifies the code significantly as it swaps or not swaps automatically (during compilation it chooses the option). Especially, as the a653lib needs to use such mechanism as well.

This can not work in the generic use case: The generated code can not itself detect whether the foreign ABI is little or big endian. For Wasm this is no problem, we always know its little endian. But for generic code, we can't know. Worst case is PowerPC, where endianness can be configured per page in the MMU.

As much as I'd like to get away with auto-detection, I hesitant to do it. What however we can do is, to have a macro that defines whether to byteswap or not for the generated code, instead of a a generator CLI flag. Do you think that would be better re. usability?

Thank you for the PR, sorry for the mess in the code. I tried to significantly clean up the code, which introduced a bunch of merge conflicts with this PR. I'm however in the progress of fixing most if not all of the bugs and most of the features mentioned above. I'll report back once I got the UB out of the pointer arithmetic.

@psiegl
Copy link
Contributor Author

psiegl commented Oct 22, 2025

BUGFIX: The function namespace was hardcoded to cal, despite of supplying a -p cal32.

That is an interesting problem. Originally the scope of this generator was structs only, but we might be able to create function stubs as well? I have to look into this further.

So, let us discuss an example:

#if 0
extern void READ_SAMPLING_MESSAGE (
       /*in */ SAMPLING_PORT_ID_TYPE      SAMPLING_PORT_ID,
       /*in */ MESSAGE_ADDR_TYPE          MESSAGE_ADDR,
               /* The message address is passed IN, although */
               /* the respective message is passed OUT       */
       /*out*/ MESSAGE_SIZE_TYPE          *LENGTH,
       /*out*/ VALIDITY_TYPE              *VALIDITY,
       /*out*/ RETURN_CODE_TYPE           *RETURN_CODE );
#endif

wasm_trap_t* WASM32_READ_SAMPLING_MESSAGE(void* env,
  wasmtime_caller_t* caller, const wasmtime_val_t* args, size_t nargs,
  wasmtime_val_t* results, size_t nresults)
{
  wasmtime_context_t *context = wasmtime_caller_context(caller);
  wasmtime_memory_t memory;
  get_exported_memory(caller, &memory);
  uint8_t* wasm_baseaddr = wasmtime_memory_data(context, &memory);


  SAMPLING_PORT_ID_TYPE SAMPLING_PORT_ID;
  SAMPLING_PORT_ID = (SAMPLING_PORT_ID_TYPE)le32toh(args[0].of.i32);
  int32_t MESSAGE_ADDR; /* is a pointer / address into Wasm linear memory */
  MESSAGE_ADDR = le32toh(args[1].of.i32);
  MESSAGE_SIZE_TYPE LENGTH;
  VALIDITY_TYPE VALIDITY;
  RETURN_CODE_TYPE RETURN_CODE;

  READ_SAMPLING_MESSAGE(
    SAMPLING_PORT_ID,
    (MESSAGE_ADDR_TYPE)&wasm_baseaddr[MESSAGE_ADDR], // FIXME: only safe as long as char[]
    &LENGTH,
    &VALIDITY,
    &RETURN_CODE
  );

  camw32_set__MESSAGE_SIZE_TYPE(&wasm_baseaddr[le32toh(args[2].of.i32)], (int32_t)LENGTH);
  camw32_set__VALIDITY_TYPE(&wasm_baseaddr[le32toh(args[3].of.i32)], (int32_t)VALIDITY);
  camw32_set__RETURN_CODE_TYPE(&wasm_baseaddr[le32toh(args[4].of.i32)], (int32_t)RETURN_CODE);

  return NULL;
}

Above code is a host code function in Wasmtime that is implemented for a Wasm32 guest. My assumption is, that the passed parameters into the function from the Wasm32 guest, won't be LE32 to host automatically by the runtime, but rather needs to be done. Thus, I am currently employing le32toh() for parameters that are 32bit wide, and le64toh() for 64bit wide parms. As I need to explicitly determine the right width (see above args[*].of.i**) I am kind of fine to chose the right le**toh().
Meanwhile typedef A653_INTEGER MESSAGE_SIZE_TYPE; while typedef long int A653_INTEGER; and in Wasm32 an int is 4Byte, I am hesitant to pass this reference into Wasms linear memory directly into the a653libs READ_SAMPLING_MESSAGE() as on an x86_64 this value can easily be handled by the compiler with 8Bytes. Thus, a653lib would overwrite content in the linear memory. As a consequence, I have an indirection through the function local variable MESSAGE_SIZE_TYPE LENGTH; that can easily handle an 8 Byte Store (despite of the value being likely 4 Byte in size), which I then want to write back as a 4 Byte Store to the Wasm32 linear memory. However, at the end it is quite similar to the structs you handle with the given c-abi-lens as well. Clang tells you the real size of this typedef and thus, your c-abi-lens can easily create at least a setter for typedefs. While, it is rather on my side a bit of guessing, if the right writeback is 1 Byte, 2 Byte, 4 Byte, .. . Yes, I've got to know the type sizes over time ;-)
My hope is, that camw32_set__MESSAGE_SIZE_TYPE() handles then the byteswap as well for the data content, and thus the wrapper as given above solely handle the byteswap for the function parameters.

My modification to create the typedef setters was roughly like this (however your code base currently looks completely different):
https://github.com/psiegl/arinc653-wasm/blob/psiegl-old/pkgs/c-abi-lens/src/main.rs#L87

    let typedefs = tu
        .get_entity()
        .get_children()
        .into_iter()
        .filter(|e| {
            if e.get_kind() != EntityKind::TypedefDecl {
                return false;
            }

            // Get the canonical type kind if available
            if let Some(ty) = e.get_type() {
                let canonical_kind = ty.get_canonical_type().get_kind();
                canonical_kind != TypeKind::Record
            } else {
                false
            }
        })
        .collect::<Vec<_>>();

This can not work in the generic use case: The generated code can not itself detect whether the foreign ABI is little or big endian. For Wasm this is no problem, we always know its little endian. But for generic code, we can't know. Worst case is PowerPC, where endianness can be configured per page in the MMU.

As much as I'd like to get away with auto-detection, I hesitant to do it. What however we can do is, to have a macro that defines whether to byteswap or not for the generated code, instead of a a generator CLI flag. Do you think that would be better re. usability?

As you see in the example, my understanding is that I need to handle LE/BE for the given function parameters anyway. However, even though I see your point with PPC LE/BE for each page, I wonder if we can handle this anyway? Today, your prefered approach was to inline the getters/setters anyway. Thus, there would be not much hope.

Note: I started reworking your code to create the host code wrappers (as given in the example above) in an automated fashion. However, there are so many caveats to consider, that I dropped this idea for the moment and just implemented it quickly by hand. However, if one would like to have a production grade approach, I would prefer the auto-gen wrapper generator ;-)

@wucke13
Copy link
Collaborator

wucke13 commented Oct 24, 2025

BUGFIX: The function namespace was hardcoded to cal, despite of supplying a -p cal32.

Also fixed now in the main branch Fix is in the works, will be merged today.

Meanwhile typedef A653_INTEGER MESSAGE_SIZE_TYPE; while typedef long int A653_INTEGER; and in Wasm32 an int is 4Byte, I am hesitant to pass this reference into Wasms linear memory directly into the a653libs READ_SAMPLING_MESSAGE() as on an x86_64 this value can easily be handled by the compiler with 8Bytes. Thus, a653lib would overwrite content in the linear memory.

I agree with the conclusion (of using a temporary stack allocated variables), but not with the reason:

Yes, the types need to be accurate. Yes, the the same type (e.g. int) has a different bit width on different architectures. However, when using a (properly working!) implementation of stdint.h (which I strongly advice), being sure matching bit width is used, becomes trivial. If matching bit width is used, I also don't see any problem with the arch (e.g. x86_64) reading more, this will not happen in an observable way (e.g. the compiler may read a machine word, so 8 bytes, only if the compiler knows this will not cause an issue).

HOWEVER: this is UB, if the pointer is not aligned. Wasm has weak alignment requirements. So does x86_64. But the abstract machine that the C spec defines does not. So, we can not just use the pointer without ensuring it is aligned. Introducing a branch for that is pointless IMHO, so the correct way can only be a byteswise copy from that pointer to a stack allocated (and hence properly aligned) uint32_t.

I generally see a strong argument for Rust here, by the way. Type handling is much cleaner (explicit, architecture-independent bit widths for integers and floats!), and actually rules around pointer handling and UB are more relaxed than with C. In C, just assigning an unaligned address to a pointer variable is instant UB. Even if the pointer is never de-referenced. In Rust, this is allowed. Just de-referencing the pointer is of course also UB in Rust.

@wucke13 wucke13 closed this Oct 24, 2025
@wucke13
Copy link
Collaborator

wucke13 commented Oct 24, 2025

Oops, didn't mean to close the PR. I did implement the Bugfix in a slightly different way, I think the namespace prefix can be applied to the token stream in code_snippets, that way we don't have to concern the code_gen itself with this. PR is in #13 .

I would prefer, if we can branch out feature requests and bug reports in the issues, than its a bit more organized. Thank you again for the feedback!

@wucke13 wucke13 reopened this Oct 24, 2025
@psiegl psiegl closed this Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants