Skip to content

ci: cross-architecture streflop/sim determinism gate (x86 vs arm64)#3079

Draft
tomjn wants to merge 1 commit into
beyond-all-reason:masterfrom
tomjn:ci/streflop-crossarch-determinism
Draft

ci: cross-architecture streflop/sim determinism gate (x86 vs arm64)#3079
tomjn wants to merge 1 commit into
beyond-all-reason:masterfrom
tomjn:ci/streflop-crossarch-determinism

Conversation

@tomjn

@tomjn tomjn commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What

Adds a manually-dispatchable CI workflow (streflop-sync-crossarch.yml) that asserts the streflop floating-point library produces bit-identical results on x86_64 (SSE) and arm64 (NEON, via sse2neon).

Why

Multiplayer sync is deterministic and depends on every client computing identical floating-point results. An ARM/Apple-Silicon client runs the streflop NEON path (through sse2neon); an x86 client runs the SSE path. If those two paths ever diverge — through compiler codegen changes, an sse2neon update, or a FastMath.h change — the result is a cross-architecture desync that only manifests in a live multiplayer game, not in any existing test. Cross-arch FP desyncs are not hypothetical; they have occurred (e.g. float-to-short angle cast UB on arm64). The NEON path is currently not exercised by CI at all.

Mechanism

Reuses the existing standalone harness in tools/sync-test/ (already in-tree):

  1. Builds streflop-float-test on ubuntu-latest (SSE) and ubuntu-24.04-arm (NEON).
  2. Runs the same deterministic input set (-n 10000, ~52K tests) on each leg.
  3. A third job downloads both artifacts and runs compare_results.py, which exits non-zero on any mismatch.

No new test/harness code — only the workflow YAML.

Scope (and how it differs from the source)

Adapted from the ExaDev/RecoilEngine macOS fork (commit 32d3855). Deliberately retargeted for upstream:

  • arm64 leg runs on Linux (ubuntu-24.04-arm), not the fork's macos-latest. The runner is already used elsewhere in this repo's CI, and a full macOS engine build currently fails on unmerged portability fixes — but this standalone streflop test doesn't need them, and Linux arm64 keeps the gate clean and free.
  • Manual dispatch only (workflow_dispatch) — see CI cost below. The fork's PR/push path-filter triggers and macOS shipping pins are intentionally omitted.
  • Complements, does not replace the existing streflop-float-test.yml. That one is x86-only and compares against a checked-in NEON reference snapshot; this builds and runs both arches live, catching drift in the actual NEON output rather than a frozen copy. It also inits the streflop/sse2neon submodules, which the build requires.

CI cost

Manual-dispatch only, so zero automatic cost — it never runs on a PR unless someone triggers it. Each run is two short builds of a single standalone test (~minutes), not full-engine builds. Happy to wire it to path-filtered PR triggers (streflop / sse2neon / FastMath.h changes) if maintainers prefer it as an automatic gate; that's a one-line change to the on: block.

Testing

  • actionlint: clean.
  • Built and ran the standalone test locally on Apple-Silicon (arm64/NEON): 52,080 tests, bit-exact vs the checked-in NEON reference (compare_results.py exit 0).
  • Verified the gate's actual assertion (SSE_x86_64 vs NEON_arm64 references): bit-exact match — the workflow goes green on current master.
  • Not locally verified: the x86/SSE build leg (verifier is on arm64); it is the same known-good CMake project the existing streflop-float-test.yml already builds on ubuntu, and will be exercised on CI.

Opened as draft so a first CI run can be watched before review.

AI usage disclosure

This PR was prepared with AI assistance (Claude Code): codebase investigation, adapting the ExaDev workflow to upstream conventions, and local verification. All changes were reviewed and tested by a human (the build/run/compare steps above were run locally). Per AI_POLICY.md.

Multiplayer sync requires bit-identical floating-point results across
architectures: an Apple-Silicon/ARM client running the streflop NEON
path (via sse2neon) must agree with an x86 client on the SSE path, byte
for byte. The NEON path is not currently exercised by CI, so codegen or
sse2neon drift could introduce a cross-arch desync that only surfaces in
a live multiplayer game.

This adds a workflow that builds the existing standalone streflop float
test (tools/sync-test) on x86_64 (SSE) and arm64 (NEON), runs the same
deterministic input set on each, and compares the results with
compare_results.py, failing on any mismatch.

It complements streflop-float-test.yml, which is x86-only and compares
against a checked-in NEON reference snapshot; this builds and runs both
arches live, so it catches divergence in the actual NEON output rather
than a frozen copy. The arm64 leg runs on the ubuntu-24.04-arm runner
already used elsewhere in CI. Unlike streflop-float-test.yml it inits
the streflop/sse2neon submodules, which are required for the build.

Scoped to manual dispatch (workflow_dispatch) for now: the submodule and
FastMath paths it guards change rarely, so it is run on demand rather
than gating every PR.

Adapted from the ExaDev/RecoilEngine macOS fork (32d3855), retargeted
from a macos-latest arm64 leg to a Linux arm64 runner.
@p2004a

p2004a commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Isn't this dupe of #2921?

@tomjn

tomjn commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Isn't this dupe of #2921?

potentially, this was extracted from the exadev fork, @bruno-dasilva if you want to extract anything of worth from this changeset I think 2921 is in a better position as it's already had feedback rounds and this is just a draft PR

@bruno-dasilva

Copy link
Copy Markdown
Collaborator

Hmm. My PR #2921 runs a full seeded instance of a BAR scenario, so its full end to end sync testing. Whereas this is streflop only. If I had to pick one it would be 2921, but perhaps there's an argument for both.

If we're just picking 10k random inputs surely the full game run in 2921 would have similar coverage? Or doe this teset some edge cases of streflop thats possibly not touched in 2921?

I think this could be valuable iff it does add significant coverage of streflop testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants