ci: cross-architecture streflop/sim determinism gate (x86 vs arm64)#3079
ci: cross-architecture streflop/sim determinism gate (x86 vs arm64)#3079tomjn wants to merge 1 commit into
Conversation
Multiplayer sync requires bit-identical floating-point results across architectures: an Apple-Silicon/ARM client running the streflop NEON path (via sse2neon) must agree with an x86 client on the SSE path, byte for byte. The NEON path is not currently exercised by CI, so codegen or sse2neon drift could introduce a cross-arch desync that only surfaces in a live multiplayer game. This adds a workflow that builds the existing standalone streflop float test (tools/sync-test) on x86_64 (SSE) and arm64 (NEON), runs the same deterministic input set on each, and compares the results with compare_results.py, failing on any mismatch. It complements streflop-float-test.yml, which is x86-only and compares against a checked-in NEON reference snapshot; this builds and runs both arches live, so it catches divergence in the actual NEON output rather than a frozen copy. The arm64 leg runs on the ubuntu-24.04-arm runner already used elsewhere in CI. Unlike streflop-float-test.yml it inits the streflop/sse2neon submodules, which are required for the build. Scoped to manual dispatch (workflow_dispatch) for now: the submodule and FastMath paths it guards change rarely, so it is run on demand rather than gating every PR. Adapted from the ExaDev/RecoilEngine macOS fork (32d3855), retargeted from a macos-latest arm64 leg to a Linux arm64 runner.
|
Isn't this dupe of #2921? |
potentially, this was extracted from the exadev fork, @bruno-dasilva if you want to extract anything of worth from this changeset I think 2921 is in a better position as it's already had feedback rounds and this is just a draft PR |
|
Hmm. My PR #2921 runs a full seeded instance of a BAR scenario, so its full end to end sync testing. Whereas this is streflop only. If I had to pick one it would be 2921, but perhaps there's an argument for both. If we're just picking 10k random inputs surely the full game run in 2921 would have similar coverage? Or doe this teset some edge cases of streflop thats possibly not touched in 2921? I think this could be valuable iff it does add significant coverage of streflop testing. |
What
Adds a manually-dispatchable CI workflow (
streflop-sync-crossarch.yml) that asserts the streflop floating-point library produces bit-identical results on x86_64 (SSE) and arm64 (NEON, via sse2neon).Why
Multiplayer sync is deterministic and depends on every client computing identical floating-point results. An ARM/Apple-Silicon client runs the streflop NEON path (through
sse2neon); an x86 client runs the SSE path. If those two paths ever diverge — through compiler codegen changes, ansse2neonupdate, or aFastMath.hchange — the result is a cross-architecture desync that only manifests in a live multiplayer game, not in any existing test. Cross-arch FP desyncs are not hypothetical; they have occurred (e.g. float-to-short angle cast UB on arm64). The NEON path is currently not exercised by CI at all.Mechanism
Reuses the existing standalone harness in
tools/sync-test/(already in-tree):streflop-float-testonubuntu-latest(SSE) andubuntu-24.04-arm(NEON).-n 10000, ~52K tests) on each leg.compare_results.py, which exits non-zero on any mismatch.No new test/harness code — only the workflow YAML.
Scope (and how it differs from the source)
Adapted from the
ExaDev/RecoilEnginemacOS fork (commit32d3855). Deliberately retargeted for upstream:ubuntu-24.04-arm), not the fork'smacos-latest. The runner is already used elsewhere in this repo's CI, and a full macOS engine build currently fails on unmerged portability fixes — but this standalone streflop test doesn't need them, and Linux arm64 keeps the gate clean and free.workflow_dispatch) — see CI cost below. The fork's PR/push path-filter triggers and macOS shipping pins are intentionally omitted.streflop-float-test.yml. That one is x86-only and compares against a checked-in NEON reference snapshot; this builds and runs both arches live, catching drift in the actual NEON output rather than a frozen copy. It also inits thestreflop/sse2neonsubmodules, which the build requires.CI cost
Manual-dispatch only, so zero automatic cost — it never runs on a PR unless someone triggers it. Each run is two short builds of a single standalone test (~minutes), not full-engine builds. Happy to wire it to path-filtered PR triggers (streflop / sse2neon / FastMath.h changes) if maintainers prefer it as an automatic gate; that's a one-line change to the
on:block.Testing
actionlint: clean.compare_results.pyexit 0).SSE_x86_64vsNEON_arm64references): bit-exact match — the workflow goes green on currentmaster.streflop-float-test.ymlalready builds on ubuntu, and will be exercised on CI.Opened as draft so a first CI run can be watched before review.
AI usage disclosure
This PR was prepared with AI assistance (Claude Code): codebase investigation, adapting the ExaDev workflow to upstream conventions, and local verification. All changes were reviewed and tested by a human (the build/run/compare steps above were run locally). Per
AI_POLICY.md.