feat: sha 256 block compression function analog to chacha and fastntt#969
Conversation
There was a problem hiding this comment.
VeIR Benchmarks
Details
| Benchmark suite | Current: d05e9b5 | Previous: 55efe5f | Ratio |
|---|---|---|---|
add-fold-worklist/create |
2225000 ns (± 108267) |
2245500 ns (± 104257) |
0.99 |
add-fold-worklist/rewrite |
3986000 ns (± 31773) |
3914000 ns (± 69073) |
1.02 |
add-fold-worklist-local/create |
2284000 ns (± 82309) |
2240000 ns (± 112286) |
1.02 |
add-fold-worklist-local/rewrite |
3427000 ns (± 141679) |
3283000 ns (± 49208) |
1.04 |
add-zero-worklist/create |
2184000 ns (± 61982) |
2217000 ns (± 103159) |
0.99 |
add-zero-worklist/rewrite |
2519000 ns (± 36568) |
2475000 ns (± 17712) |
1.02 |
add-zero-reuse-worklist/create |
1802000 ns (± 92148) |
1880000 ns (± 73557) |
0.96 |
add-zero-reuse-worklist/rewrite |
2120000 ns (± 59952) |
2091000 ns (± 33522) |
1.01 |
mul-two-worklist/create |
2278000 ns (± 105464) |
2232500 ns (± 150841) |
1.02 |
mul-two-worklist/rewrite |
5580000 ns (± 70986) |
5524000 ns (± 405471) |
1.01 |
add-fold-forwards/create |
2252000 ns (± 65574) |
2209000 ns (± 94723) |
1.02 |
add-fold-forwards/rewrite |
3010000 ns (± 21606) |
2941000 ns (± 28900) |
1.02 |
add-zero-forwards/create |
2206000 ns (± 39221) |
2227000 ns (± 107440) |
0.99 |
add-zero-forwards/rewrite |
1974000 ns (± 28658) |
1919000 ns (± 12558) |
1.03 |
add-zero-reuse-forwards/create |
1911000 ns (± 35710) |
1788000 ns (± 76549) |
1.07 |
add-zero-reuse-forwards/rewrite |
1640000 ns (± 57306) |
1526000 ns (± 11336) |
1.07 |
mul-two-forwards/create |
2225000 ns (± 102534) |
2220000 ns (± 33648) |
1.00 |
mul-two-forwards/rewrite |
3627000 ns (± 20169) |
3598000 ns (± 90024) |
1.01 |
add-zero-reuse-first/create |
1893500 ns (± 95540) |
1848000 ns (± 78479) |
1.02 |
add-zero-reuse-first/rewrite |
8000 ns (± 1790) |
8000 ns (± 1303) |
1 |
add-zero-lots-of-reuse-first/create |
1850000 ns (± 86048) |
1842500 ns (± 81969) |
1.00 |
add-zero-lots-of-reuse-first/rewrite |
829000 ns (± 24884) |
787500 ns (± 58352) |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
| // CHECK-NEXT: %[[v894:[0-9]+]] = "riscv.li"() <{"value" = -1 : i32}> : () -> !riscv.reg | ||
| // CHECK-NEXT: %[[v895:[0-9]+]] = "builtin.unrealized_conversion_cast"(%[[v894]]) : (!riscv.reg) -> i32 | ||
| // CHECK-NEXT: %[[v892:[0-9]+]] = "riscv.li"() <{"value" = 2 : i32}> : () -> !riscv.reg | ||
| // CHECK-NEXT: %[[v893:[0-9]+]] = "builtin.unrealized_conversion_cast"(%[[v892]]) : (!riscv.reg) -> i32 |
There was a problem hiding this comment.
There is a huge number of unrealized_conversion_casts. Why have these not been eliminated?
There was a problem hiding this comment.
good point, we never added them for i32 and ptr reg ptr. will look into it now but the reg i32 reg cast could be tricky.
There was a problem hiding this comment.
I updated it and removed all the casts via a rewrite wherever we had a cast in the direction of i32 to reg. The remaining casts are in exactly the opposite direction, and we currently can't remove those.
|
Very cool. I let @luisacicolini do the full review. |
|
this LGTM but I think it should land only after #970 has landed and the tests are updated to not contain all the casts |
This PR adds C code for the SHA-256 block compression function (core of SHA-256) together with the test cases that verify our RISC-V lowering of it. It follows the same self-contained style as fastntt.c / chacha20.c (so a single exported function over caller-provided buffers, no local arrays or globals).