Skip to content

feat: inherit upstream evalLock + compile/closure return-value fixes#7

Open
Gajesh2007 wants to merge 3 commits into
mainfrom
feat/inherit-upstream-2026-06
Open

feat: inherit upstream evalLock + compile/closure return-value fixes#7
Gajesh2007 wants to merge 3 commits into
mainfrom
feat/inherit-upstream-2026-06

Conversation

@Gajesh2007

@Gajesh2007 Gajesh2007 commented Jun 24, 2026

Copy link
Copy Markdown
Member

Update — 2026-06-23: branch refreshed (no force) + capstone opened

This branch was fast-forwarded to merge origin/main into it → new head e20ea3dd (was 63722be). That pulls in the measurement-only EvalProbe instrumentation (fork PR #6 / d-inference#451, master pin 3c50ad69), so the combined tree is the basis for the d-inference pin bump.

  • Clean merge. MLXArray.swift auto-merged: EvalProbe brackets eval() (theirs); this branch's fix wraps description/tostring in evalLock (ours) — disjoint regions, both preserved.
  • Strict superset (no instrumentation reverted): git merge-base --is-ancestor 3c50ad69 e20ea3dd → exit 0.
  • Re-validated: mlx-swift + provider-swift build clean; 1064 tests / 74 suites green; live GPT-OSS-20B (compile() path) and Gemma-4-26B (batched + tool-calling) produce coherent output.
  • Capstone: the d-inference superproject pin bump (3c50ad69e20ea3dd) is now open — chore(libs): inherit upstream MLX — bump mlx-swift / mlx-swift-lm pins to combined inherited+instrumented heads d-inference#459.

The "intentionally on hold" section below is superseded by the above.


Summary

Selective inheritance of upstream correctness fixes from ml-explore/mlx-swift into the Layr-Labs fork. This is not a blanket git merge upstream/main — the only large upstream commit (e23ae6b, the Linux/CUDA SPM rewrite) is deliberately not taken (zero value on Apple Silicon/Metal, highest conflict/risk). Just 2 tiny, conflict-free correctness fixes land here, on top of ac67822.

  • Base: main (d3d12a1) · Head: feat/inherit-upstream-2026-06 (63722be)
  • Diff = exactly the 2 commits below (merge-base with main is our fork point ac67822).

Inherited changes

Commit (this branch) Upstream source What & why
4ceb9b4 fix(thread-safety): hold evalLock while computing tostring 058eda6 (upstream ml-explore#410) MLXArray/Device/Stream .description call mlx_*_tostring, which internally calls eval and is not thread-safe. Wraps the 3 *_tostring calls in evalLock.withLock. Directly applicable to our concurrent continuous-batching, multi-threaded provider, where any thread stringifying an array for a log/error line races the evaluator. The unrelated .github CI-yml hunk from upstream was intentionally dropped.
63722be fix: check mlx_detail_compile and mlx_closure_apply return values (#398) 89cece7 (upstream ml-explore#398) Both C calls return int (0=ok/1=fail) but the status was ignored. When an MLX error fires inside a withError scope, execution continued, innerCall returned an empty vector, and the compile overloads then hit a Swift Index out of range trap — an uncatchable crash that bypasses withError and takes down the whole long-running provider (every batched request with it). Now captures both statuses, early-returns on failure, and lets withError throw cleanly. Regression tests included. The model layer in mlx-swift-lm uses compile() (GPT-OSS, DeepseekV4, SSM, GatedDelta, Bitnet).

Deliberately skipped / deferred

  • e23ae6b — Linux/CUDA SPM build → SKIP. Rewrites Package.swift, bumps swift-tools-version to 6.3;(experimentalCGen) (our provider-swift is on 6.1), adds a CudaBuild plugin + encuda target + swift-argument-parser dep + a 4.3k-line generated CUDA header. Zero functional value on Apple Silicon/Metal (our only target) and the single highest conflict item (collides with our Cmlx product + jaccl excludes and our Layr-Labs Cmlx-submodule pins). Kept out so our customizations and the toolchain floor are not silently changed.
  • 1cd3ed5 — CPU-only default device → SKIP. Only changes behavior on a host with no Metal and no CUDA; on Apple Silicon the core still resolves to .gpu.
  • bd196a9 — AdamW bias correction → SKIP. Training-only; we are inference-only and don't depend on MLXOptimizers.
  • dc43e62 — nuclear norm in linalg → DEFER. No current consumer (provider never calls linalg norm); clean to add if ever needed.

Our local customizations are untouched: Package.swift (Cmlx product + jaccl), Layr-Labs Cmlx-submodule pins, ParallelFileReader 128 MiB batch, and the Metal resource-COUNT exposure + test.

Validation

  • libs/mlx-swift swift build — green.
  • Integrated through the provider (path-depends on this fork): provider-swift build + 1064 tests / 74 suites passed.
  • Live on-GPU (Apple M4 Max, 128 GB):
    • GPT-OSS-20B (mlx-community/gpt-oss-20b-MXFP4-Q8) exercises the inherited compile() return-value path through GPTOSS.swiftPASS (TTFT 0.25 s, ~83 tok/s, coherent output; happy path not regressed by the new throw-on-error checks).
    • 2 concurrent Gemma-4-26B requests + B=2 batched tests exercise the evalLock/tostring path under concurrent eval — no race/crash.

Before / After

Behavior

flowchart LR
  subgraph Before["Before (fork @ ac67822)"]
    A1["concurrent batched server<br/>stringifies MLXArray off the eval thread"] --> A2["data race on the evaluator"]
    A3["compile() error inside withError"] --> A4["empty vector -> Index out of range<br/>UNCATCHABLE trap -> whole provider crashes"]
  end
  subgraph After["After (@ 63722be)"]
    B1["stringify under evalLock.withLock"] --> B2["thread-safe, no race"]
    B3["compile()/closure failure"] --> B4["status checked -> catchable throw<br/>request fails, process survives"]
  end
Loading

Code

flowchart LR
  subgraph Before["Before"]
    C1["MLXArray/Device/Stream .description"] --> C2["mlx_*_tostring (no lock)"]
    C3["Transforms+Compile innerCall"] --> C4["ignores mlx_detail_compile /<br/>mlx_closure_apply retvals -> [] -> crash"]
  end
  subgraph After["After"]
    D1["MLXArray/Device/Stream .description"] --> D2["evalLock.withLock { mlx_*_tostring }"]
    D3["Transforms+Compile innerCall"] --> D4["guard status == 0 else return [] ;<br/>overloads return placeholder -> withError throws"]
  end
Loading

Note: main carries a separate EvalProbe measurement line (3c50ad6, fork PR #6) layered on the same fork point ac67822; it is intentionally not part of this PR's diff. See the cross-repo note below for how this interacts with the d-inference superproject pin.

Related / cross-repo

Part of the "inherit upstream MLX improvements" effort, spanning three repos:

  • mlx-swift (this PR): Layr-Labs/mlx-swift#7
  • mlx-swift-lm: Layr-Labs/mlx-swift-lm#50
  • d-inference superproject (gitlink bump): not yet opened — see below.

d-inference superproject capstone — intentionally on hold

The capstone PR was meant to bump the d-inference submodule pins to the heads of these two branches. It is deliberately not opened yet because d-inference origin/master has already advanced both submodule pins past the bases these branches were cut from, via d-inference#451 ("Instrument the first-token wedge … measurement only"):

Bumping the pins straight to these branch heads would silently revert that merged measurement instrumentation. Cleanest resolution: merge these two fork PRs into their main branches first (each main already contains the instrumentation), then bump the d-inference pins to the new main HEADs (which then carry both the instrumentation and this inheritance). The capstone's validation (provider build + 1064 tests + live Gemma-4-26B / GPT-OSS-20B) was run against these branch heads.

Gajesh2007 and others added 3 commits June 23, 2026 13:25
Wrap mlx_{array,device,stream}_tostring in evalLock.withLock so
stringifying an MLXArray/Device/Stream from a non-eval thread cannot
race the evaluator. tostring internally calls eval, which is not
thread-safe -- a real hazard for our continuous-batching multi-threaded
provider (log lines / error messages stringify arrays under load).

Hand-ported from ml-explore/mlx-swift 058eda6 (ml-explore#410); the unrelated
.github CI hunk (show-sdk-version removal) is intentionally dropped.

Co-authored-by: Cursor <cursoragent@cursor.com>
…-explore#398)

mlx_detail_compile and mlx_closure_apply both return int (0=success,
1=failure) but their return values were silently ignored. When an error
fires inside a withError scope the MLX error handler stores the error in
an ErrorBox instead of calling fatalError; execution then continues past
the failed call, innerCall returns an empty result vector, and the
single/two/three-array compile overloads crash with a Swift 'Index out
of range' trap — bypassing withError entirely.

Fix: capture both return values and early-return [] from innerCall on
failure. The placeholder return from the compile overloads is never
observed by the caller because withError throws before the value is
used.

Adds three regression tests covering the single-array, two-array, and
[MLXArray]->[MLXArray] compile overloads.

(cherry picked from commit 89cece7)
…-upstream-2026-06

Brings the measurement-only EvalProbe instrumentation (d-inference#451, master
pin 3c50ad6) onto the inheritance branch so the combined tree is a strict
superset of the d-inference master pin. Preserves both the inheritance fixes
(evalLock-held tostring, compile/closure return-value checks) and the EvalProbe
eval-path instrumentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants