feat: inherit upstream evalLock + compile/closure return-value fixes by Gajesh2007 · Pull Request #7 · Layr-Labs/mlx-swift

Gajesh2007 · 2026-06-24T00:40:37Z

Update — 2026-06-23: branch refreshed (no force) + capstone opened

This branch was fast-forwarded to merge origin/main into it → new head e20ea3dd (was 63722be). That pulls in the measurement-only EvalProbe instrumentation (fork PR #6 / d-inference#451, master pin 3c50ad69), so the combined tree is the basis for the d-inference pin bump.

Clean merge. MLXArray.swift auto-merged: EvalProbe brackets eval() (theirs); this branch's fix wraps description/tostring in evalLock (ours) — disjoint regions, both preserved.

Strict superset (no instrumentation reverted): git merge-base --is-ancestor 3c50ad69 e20ea3dd → exit 0.

Re-validated: mlx-swift + provider-swift build clean; 1064 tests / 74 suites green; live GPT-OSS-20B (compile() path) and Gemma-4-26B (batched + tool-calling) produce coherent output.

Capstone: the d-inference superproject pin bump (3c50ad69 → e20ea3dd) is now open — chore(libs): inherit upstream MLX — bump mlx-swift / mlx-swift-lm pins to combined inherited+instrumented heads d-inference#459.

The "intentionally on hold" section below is superseded by the above.

Summary

Selective inheritance of upstream correctness fixes from ml-explore/mlx-swift into the Layr-Labs fork. This is not a blanket git merge upstream/main — the only large upstream commit (e23ae6b, the Linux/CUDA SPM rewrite) is deliberately not taken (zero value on Apple Silicon/Metal, highest conflict/risk). Just 2 tiny, conflict-free correctness fixes land here, on top of ac67822.

Base: main (d3d12a1) · Head: feat/inherit-upstream-2026-06 (63722be)
Diff = exactly the 2 commits below (merge-base with main is our fork point ac67822).

Inherited changes

Commit (this branch)	Upstream source	What & why
`4ceb9b4` `fix(thread-safety): hold evalLock while computing tostring`	`058eda6` (upstream ml-explore#410)	`MLXArray`/`Device`/`Stream` `.description` call `mlx__tostring`, which internally calls `eval` and is not thread-safe. Wraps the 3 `_tostring` calls in `evalLock.withLock`. Directly applicable to our concurrent continuous-batching, multi-threaded provider, where any thread stringifying an array for a log/error line races the evaluator. The unrelated `.github` CI-yml hunk from upstream was intentionally dropped.
`63722be` `fix: check mlx_detail_compile and mlx_closure_apply return values (#398)`	`89cece7` (upstream ml-explore#398)	Both C calls return `int` (0=ok/1=fail) but the status was ignored. When an MLX error fires inside a `withError` scope, execution continued, `innerCall` returned an empty vector, and the `compile` overloads then hit a Swift `Index out of range` trap — an uncatchable crash that bypasses `withError` and takes down the whole long-running provider (every batched request with it). Now captures both statuses, early-returns on failure, and lets `withError` `throw` cleanly. Regression tests included. The model layer in `mlx-swift-lm` uses `compile()` (GPT-OSS, DeepseekV4, SSM, GatedDelta, Bitnet).

Deliberately skipped / deferred

e23ae6b — Linux/CUDA SPM build → SKIP. Rewrites Package.swift, bumps swift-tools-version to 6.3;(experimentalCGen) (our provider-swift is on 6.1), adds a CudaBuild plugin + encuda target + swift-argument-parser dep + a 4.3k-line generated CUDA header. Zero functional value on Apple Silicon/Metal (our only target) and the single highest conflict item (collides with our Cmlx product + jaccl excludes and our Layr-Labs Cmlx-submodule pins). Kept out so our customizations and the toolchain floor are not silently changed.
1cd3ed5 — CPU-only default device → SKIP. Only changes behavior on a host with no Metal and no CUDA; on Apple Silicon the core still resolves to .gpu.
bd196a9 — AdamW bias correction → SKIP. Training-only; we are inference-only and don't depend on MLXOptimizers.
dc43e62 — nuclear norm in linalg → DEFER. No current consumer (provider never calls linalg norm); clean to add if ever needed.

Our local customizations are untouched: Package.swift (Cmlx product + jaccl), Layr-Labs Cmlx-submodule pins, ParallelFileReader 128 MiB batch, and the Metal resource-COUNT exposure + test.

Validation

libs/mlx-swift swift build — green.
Integrated through the provider (path-depends on this fork): provider-swift build + 1064 tests / 74 suites passed.
Live on-GPU (Apple M4 Max, 128 GB):
- GPT-OSS-20B (mlx-community/gpt-oss-20b-MXFP4-Q8) exercises the inherited compile() return-value path through GPTOSS.swift — PASS (TTFT 0.25 s, ~83 tok/s, coherent output; happy path not regressed by the new throw-on-error checks).
- 2 concurrent Gemma-4-26B requests + B=2 batched tests exercise the evalLock/tostring path under concurrent eval — no race/crash.

Before / After

Behavior

flowchart LR
  subgraph Before["Before (fork @ ac67822)"]
    A1["concurrent batched server<br/>stringifies MLXArray off the eval thread"] --> A2["data race on the evaluator"]
    A3["compile() error inside withError"] --> A4["empty vector -> Index out of range<br/>UNCATCHABLE trap -> whole provider crashes"]
  end
  subgraph After["After (@ 63722be)"]
    B1["stringify under evalLock.withLock"] --> B2["thread-safe, no race"]
    B3["compile()/closure failure"] --> B4["status checked -> catchable throw<br/>request fails, process survives"]
  end

Code

flowchart LR
  subgraph Before["Before"]
    C1["MLXArray/Device/Stream .description"] --> C2["mlx_*_tostring (no lock)"]
    C3["Transforms+Compile innerCall"] --> C4["ignores mlx_detail_compile /<br/>mlx_closure_apply retvals -> [] -> crash"]
  end
  subgraph After["After"]
    D1["MLXArray/Device/Stream .description"] --> D2["evalLock.withLock { mlx_*_tostring }"]
    D3["Transforms+Compile innerCall"] --> D4["guard status == 0 else return [] ;<br/>overloads return placeholder -> withError throws"]
  end

Note: main carries a separate EvalProbe measurement line (3c50ad6, fork PR #6) layered on the same fork point ac67822; it is intentionally not part of this PR's diff. See the cross-repo note below for how this interacts with the d-inference superproject pin.

Related / cross-repo

Part of the "inherit upstream MLX improvements" effort, spanning three repos:

mlx-swift (this PR): Layr-Labs/mlx-swift#7
mlx-swift-lm: Layr-Labs/mlx-swift-lm#50
d-inference superproject (gitlink bump): not yet opened — see below.

d-inference superproject capstone — intentionally on hold

The capstone PR was meant to bump the d-inference submodule pins to the heads of these two branches. It is deliberately not opened yet because d-inference origin/master has already advanced both submodule pins past the bases these branches were cut from, via d-inference#451 ("Instrument the first-token wedge … measurement only"):

libs/mlx-swift: master pins 3c50ad69 = our base ac67822 + 3 EvalProbe instrumentation commits (fork PR EvalProbe: measurement-only instrumentation of the blocking eval path #6). This branch is ac67822 + 2 inheritance commits → the two lines diverged at ac67822.
libs/mlx-swift-lm: master pins 2b4b0d8d = base 8a9bc7c + 1 EngineCore idle-clear marker (fork Add build from source doc to CONTRIBUTING.md ml-explore/mlx-swift#48).

Bumping the pins straight to these branch heads would silently revert that merged measurement instrumentation. Cleanest resolution: merge these two fork PRs into their main branches first (each main already contains the instrumentation), then bump the d-inference pins to the new main HEADs (which then carry both the instrumentation and this inheritance). The capstone's validation (provider build + 1064 tests + live Gemma-4-26B / GPT-OSS-20B) was run against these branch heads.

Wrap mlx_{array,device,stream}_tostring in evalLock.withLock so stringifying an MLXArray/Device/Stream from a non-eval thread cannot race the evaluator. tostring internally calls eval, which is not thread-safe -- a real hazard for our continuous-batching multi-threaded provider (log lines / error messages stringify arrays under load). Hand-ported from ml-explore/mlx-swift 058eda6 (ml-explore#410); the unrelated .github CI hunk (show-sdk-version removal) is intentionally dropped. Co-authored-by: Cursor <cursoragent@cursor.com>

…-explore#398) mlx_detail_compile and mlx_closure_apply both return int (0=success, 1=failure) but their return values were silently ignored. When an error fires inside a withError scope the MLX error handler stores the error in an ErrorBox instead of calling fatalError; execution then continues past the failed call, innerCall returns an empty result vector, and the single/two/three-array compile overloads crash with a Swift 'Index out of range' trap — bypassing withError entirely. Fix: capture both return values and early-return [] from innerCall on failure. The placeholder return from the compile overloads is never observed by the caller because withError throws before the value is used. Adds three regression tests covering the single-array, two-array, and [MLXArray]->[MLXArray] compile overloads. (cherry picked from commit 89cece7)

…-upstream-2026-06 Brings the measurement-only EvalProbe instrumentation (d-inference#451, master pin 3c50ad6) onto the inheritance branch so the combined tree is a strict superset of the d-inference master pin. Preserves both the inheritance fixes (evalLock-held tostring, compile/closure return-value checks) and the EvalProbe eval-path instrumentation.

Gajesh2007 and others added 3 commits June 23, 2026 13:25

Gajesh2007 mentioned this pull request Jun 24, 2026

chore(libs): inherit upstream MLX — bump mlx-swift / mlx-swift-lm pins to combined inherited+instrumented heads Layr-Labs/d-inference#459

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: inherit upstream evalLock + compile/closure return-value fixes#7

feat: inherit upstream evalLock + compile/closure return-value fixes#7
Gajesh2007 wants to merge 3 commits into
mainfrom
feat/inherit-upstream-2026-06

Gajesh2007 commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Gajesh2007 commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update — 2026-06-23: branch refreshed (no force) + capstone opened

Summary

Inherited changes

Deliberately skipped / deferred

Validation

Before / After

Related / cross-repo

d-inference superproject capstone — intentionally on hold

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gajesh2007 commented Jun 24, 2026 •

edited

Loading