Skip to content

feat: sha 256 block compression function analog to chacha and fastntt#969

Merged
salinhkuhn merged 4 commits into
mainfrom
feat-sha256
Jul 3, 2026
Merged

feat: sha 256 block compression function analog to chacha and fastntt#969
salinhkuhn merged 4 commits into
mainfrom
feat-sha256

Conversation

@salinhkuhn

Copy link
Copy Markdown
Contributor

This PR adds C code for the SHA-256 block compression function (core of SHA-256) together with the test cases that verify our RISC-V lowering of it. It follows the same self-contained style as fastntt.c / chacha20.c (so a single exported function over caller-provided buffers, no local arrays or globals).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VeIR Benchmarks

Details
Benchmark suite Current: d05e9b5 Previous: 55efe5f Ratio
add-fold-worklist/create 2225000 ns (± 108267) 2245500 ns (± 104257) 0.99
add-fold-worklist/rewrite 3986000 ns (± 31773) 3914000 ns (± 69073) 1.02
add-fold-worklist-local/create 2284000 ns (± 82309) 2240000 ns (± 112286) 1.02
add-fold-worklist-local/rewrite 3427000 ns (± 141679) 3283000 ns (± 49208) 1.04
add-zero-worklist/create 2184000 ns (± 61982) 2217000 ns (± 103159) 0.99
add-zero-worklist/rewrite 2519000 ns (± 36568) 2475000 ns (± 17712) 1.02
add-zero-reuse-worklist/create 1802000 ns (± 92148) 1880000 ns (± 73557) 0.96
add-zero-reuse-worklist/rewrite 2120000 ns (± 59952) 2091000 ns (± 33522) 1.01
mul-two-worklist/create 2278000 ns (± 105464) 2232500 ns (± 150841) 1.02
mul-two-worklist/rewrite 5580000 ns (± 70986) 5524000 ns (± 405471) 1.01
add-fold-forwards/create 2252000 ns (± 65574) 2209000 ns (± 94723) 1.02
add-fold-forwards/rewrite 3010000 ns (± 21606) 2941000 ns (± 28900) 1.02
add-zero-forwards/create 2206000 ns (± 39221) 2227000 ns (± 107440) 0.99
add-zero-forwards/rewrite 1974000 ns (± 28658) 1919000 ns (± 12558) 1.03
add-zero-reuse-forwards/create 1911000 ns (± 35710) 1788000 ns (± 76549) 1.07
add-zero-reuse-forwards/rewrite 1640000 ns (± 57306) 1526000 ns (± 11336) 1.07
mul-two-forwards/create 2225000 ns (± 102534) 2220000 ns (± 33648) 1.00
mul-two-forwards/rewrite 3627000 ns (± 20169) 3598000 ns (± 90024) 1.01
add-zero-reuse-first/create 1893500 ns (± 95540) 1848000 ns (± 78479) 1.02
add-zero-reuse-first/rewrite 8000 ns (± 1790) 8000 ns (± 1303) 1
add-zero-lots-of-reuse-first/create 1850000 ns (± 86048) 1842500 ns (± 81969) 1.00
add-zero-lots-of-reuse-first/rewrite 829000 ns (± 24884) 787500 ns (± 58352) 1.05

This comment was automatically generated by workflow using github-action-benchmark.

// CHECK-NEXT: %[[v894:[0-9]+]] = "riscv.li"() <{"value" = -1 : i32}> : () -> !riscv.reg
// CHECK-NEXT: %[[v895:[0-9]+]] = "builtin.unrealized_conversion_cast"(%[[v894]]) : (!riscv.reg) -> i32
// CHECK-NEXT: %[[v892:[0-9]+]] = "riscv.li"() <{"value" = 2 : i32}> : () -> !riscv.reg
// CHECK-NEXT: %[[v893:[0-9]+]] = "builtin.unrealized_conversion_cast"(%[[v892]]) : (!riscv.reg) -> i32

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a huge number of unrealized_conversion_casts. Why have these not been eliminated?

@salinhkuhn salinhkuhn Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, we never added them for i32 and ptr reg ptr. will look into it now but the reg i32 reg cast could be tricky.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it in #970

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it and removed all the casts via a rewrite wherever we had a cast in the direction of i32 to reg. The remaining casts are in exactly the opposite direction, and we currently can't remove those.

@tobiasgrosser

Copy link
Copy Markdown
Collaborator

Very cool. I let @luisacicolini do the full review.

@regehr

regehr commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

this LGTM but I think it should land only after #970 has landed and the tests are updated to not contain all the casts

@salinhkuhn salinhkuhn added this pull request to the merge queue Jul 3, 2026
Merged via the queue into main with commit 13d7e8d Jul 3, 2026
5 checks passed
@salinhkuhn salinhkuhn deleted the feat-sha256 branch July 3, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants