Skip to content

Conversation

@willieyz
Copy link
Contributor

@willieyz willieyz commented Oct 17, 2025

  • Resolves Port simpasm script from mlkem-native #471
  • Reference from Add script for assembly simplification mlkem-native#717
  • Reference from https://github.com/pq-code-package/mlkem-native/blob/main/scripts/simpasm
  • The purpose of this PR is to port the simpasm script from mlkem-native to mldsa-naive. Currently, there are some differences between the two repositories, so the simpasm script and the related CI jobs need to be adjusted. All aarch64 and x86_64 assembly code has been simplified on their respective systems.
  • We have also modified some assembly code in mldsa-native to make it compatible with our simpasm scripts. The detailed description is as follows:
    • Since the CFI have not been ported to mldsa-native yet, we have temporarily removed the --cfify argument in the autogen scripts and added a comment:
      # TODO: Temporarily remove the "--cfify"; add back when the CFI script is implemented.
      Once the CFI scripts are ported (see issue Call Frame Information (CFI) directives missing in assembly functions which breaks stack unwinding #425), this will be restored.

    • For the assembly file poly_caddq_asm.S, there is an .balign 16 directive. We have changed this .balign to match our standard setting in simpasm (see autogen_header), where .balign is set to 4, and simplified it in a separate commit.

    • For the assembly files rej_uniform_eta2_asm.S and rej_uniform_eta4_asm.S, there are two consecutive labels:

      rej_uniform_eta2_loop8_end:
      rej_uniform_eta2_memory_copy:
      
      rej_uniform_eta4_loop8_end:
      rej_uniform_eta4_memory_copy:
      

      The llvm-objdump discards one of the consecutive labels, which causes the autogen script to fail. We removed the end label since it is not used anywhere, and also simplified this two in a separate commit.

@willieyz willieyz force-pushed the simpasm branch 10 times, most recently from 093e624 to aa273c1 Compare October 23, 2025 07:55
@willieyz willieyz marked this pull request as ready for review October 23, 2025 07:55
@willieyz willieyz requested a review from a team as a code owner October 23, 2025 07:55
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willieyz! I think the changes look okay - I still need to check the autogen script in detail.

The commit structure, however, is currently a large mess.
Most importantly, please make sure that git can track the move from src/ to dev/. That should be done in a separate commit. Then make all the necessary changes to the assembly in dev/ in a separate commit each. Then finally autogenerate the assemby in a single commit.

Comment on lines 226 to 229
- arg: '--aarch64-clean'
name: Clean
- arg: ''
name: Optimized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this? We don't have an optimized/clean seperation so far?

Copy link
Contributor Author

@willieyz willieyz Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right about this, I originally thought that we would have optimized/clean eventually, so I added this.
Now I remove this and add a TODO comment instead, thanks for your review!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the second commit so that git can track this as a file move so that we keep history for these files. I thought I had mentioned this before.

Copy link
Contributor Author

@willieyz willieyz Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @mkannwischer, sorry about this mistake,
Now I re-organized all file structure, do it by move instead of copy, this is my fault,
Thank you for your review!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the third commit is removing files from dev/aarch64_clean/src

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also fixed after I re-organized the structure from copy to move,
thanks you for your review!

@willieyz willieyz force-pushed the simpasm branch 3 times, most recently from 402db8a to dce1c9c Compare October 23, 2025 10:47
@willieyz willieyz requested a review from mkannwischer October 23, 2025 11:39
- This commit ports the `simpasm` script and related functions in
  `autogen` from `mlkem-native` to `mldsa-native`.

- Added the `simpasm` job in `base.yml` and changed the `nix-shell` from
  `ci` to `ci-cross` in `ci.yml` for ASM simplification.

- Since the CFI scripts have not yet been ported to `mldsa-native`, the
  `--cfify` argument has been removed temporary. A TODO comment has
  been added and will be restored once the CFI scripts are ported.

Signed-off-by: willieyz <[email protected]>
- This commit add simpasm header and footer to all assembly code in dev/

- Change balign from 16 to 4 in poly_caddq_asm.S to match our
  standard setting.
- The llvm-objdump discards one of the consecutive labels, which causes
  the autogen script to fail. We removed the end label since it is not
  used anywhere, and also simplified this two in a separate commit.

Signed-off-by: willieyz <[email protected]>
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willieyz! The commit structure is now much better.
Here are a couple of more comments primarily to align things with mlkem-native - otherwise maintainance of this is going to be unmanagable over time.

- uses: ./.github/actions/setup-shell
with:
nix-shell: 'ci'
nix-shell: 'ci-cross' # Need cross-compiler for ASM simplification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to copy the nix-cache: 'true' here? Or was that intentionaly omited?

Comment on lines +1247 to +1268
def update_via_copy(infile_full, outfile_full, dry_run=False, transform=None):
status_update("copy", f"{infile_full} -> {outfile_full}")

with open(infile_full, "r") as f:
content = f.read()

if transform is not None:
content = transform(content)

update_file(outfile_full, content, dry_run=dry_run)


def update_via_remove(filename, dry_run=False):
if dry_run is True:
print(
f"Autogenerated file {filename} needs removing. Have you called scripts/autogen?",
file=sys.stderr,
)
exit(1)

# Remove the file
os.remove(filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we stick to the same order of functions as in mlkem-native, to minimize the diff.
In mlkem-native update_via_copy and update_via_remove come much later in the file.

# Determine architecture from filename
arch = "aarch64" if "aarch64" in infile_full else "x86_64"

# TODO: Temporary remvoe the "--cfify", add back when CFI script added.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just leave it in there but comment it out? That will make it easier to re-add it in the correct place later.

simpasm:
strategy:
fail-fast: false
matrix: # TODO: add backend option after we have optimized/clean seperation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please keep it in the file but comment it out in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      matrix: 
        backend:
           - arg: '--aarch64-clean'
             name: Clean
        # TODO: add backend option after we have optimized/clean seperation
        #   - arg: ''
        #     name: Optimized

Then you can also change the default for --aarch64-clean back to False.

Comment on lines +1741 to +1746
synchronize_backends(
dry_run=args.dry_run,
clean=args.aarch64_clean,
force_cross=args.force_cross,
no_simplify=args.no_simplify,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the arguments ordered differently here from mlkem-native? Please try to minimize the diff.

no_simplify=args.no_simplify,
)

high_level_status("Synchronized backends")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another diff here because of spacing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port simpasm script from mlkem-native

2 participants