feat: inherit upstream evalLock + compile/closure return-value fixes#7
Open
Gajesh2007 wants to merge 3 commits into
Open
feat: inherit upstream evalLock + compile/closure return-value fixes#7Gajesh2007 wants to merge 3 commits into
Gajesh2007 wants to merge 3 commits into
Conversation
Wrap mlx_{array,device,stream}_tostring in evalLock.withLock so
stringifying an MLXArray/Device/Stream from a non-eval thread cannot
race the evaluator. tostring internally calls eval, which is not
thread-safe -- a real hazard for our continuous-batching multi-threaded
provider (log lines / error messages stringify arrays under load).
Hand-ported from ml-explore/mlx-swift 058eda6 (ml-explore#410); the unrelated
.github CI hunk (show-sdk-version removal) is intentionally dropped.
Co-authored-by: Cursor <cursoragent@cursor.com>
…-explore#398) mlx_detail_compile and mlx_closure_apply both return int (0=success, 1=failure) but their return values were silently ignored. When an error fires inside a withError scope the MLX error handler stores the error in an ErrorBox instead of calling fatalError; execution then continues past the failed call, innerCall returns an empty result vector, and the single/two/three-array compile overloads crash with a Swift 'Index out of range' trap — bypassing withError entirely. Fix: capture both return values and early-return [] from innerCall on failure. The placeholder return from the compile overloads is never observed by the caller because withError throws before the value is used. Adds three regression tests covering the single-array, two-array, and [MLXArray]->[MLXArray] compile overloads. (cherry picked from commit 89cece7)
…-upstream-2026-06 Brings the measurement-only EvalProbe instrumentation (d-inference#451, master pin 3c50ad6) onto the inheritance branch so the combined tree is a strict superset of the d-inference master pin. Preserves both the inheritance fixes (evalLock-held tostring, compile/closure return-value checks) and the EvalProbe eval-path instrumentation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Selective inheritance of upstream correctness fixes from
ml-explore/mlx-swiftinto the Layr-Labs fork. This is not a blanketgit merge upstream/main— the only large upstream commit (e23ae6b, the Linux/CUDA SPM rewrite) is deliberately not taken (zero value on Apple Silicon/Metal, highest conflict/risk). Just 2 tiny, conflict-free correctness fixes land here, on top ofac67822.main(d3d12a1) · Head:feat/inherit-upstream-2026-06(63722be)mainis our fork pointac67822).Inherited changes
4ceb9b4fix(thread-safety): hold evalLock while computing tostring058eda6(upstream ml-explore#410)MLXArray/Device/Stream.descriptioncallmlx_*_tostring, which internally callsevaland is not thread-safe. Wraps the 3*_tostringcalls inevalLock.withLock. Directly applicable to our concurrent continuous-batching, multi-threaded provider, where any thread stringifying an array for a log/error line races the evaluator. The unrelated.githubCI-yml hunk from upstream was intentionally dropped.63722befix: check mlx_detail_compile and mlx_closure_apply return values (#398)89cece7(upstream ml-explore#398)int(0=ok/1=fail) but the status was ignored. When an MLX error fires inside awithErrorscope, execution continued,innerCallreturned an empty vector, and thecompileoverloads then hit a SwiftIndex out of rangetrap — an uncatchable crash that bypasseswithErrorand takes down the whole long-running provider (every batched request with it). Now captures both statuses, early-returns on failure, and letswithErrorthrowcleanly. Regression tests included. The model layer inmlx-swift-lmusescompile()(GPT-OSS, DeepseekV4, SSM, GatedDelta, Bitnet).Deliberately skipped / deferred
e23ae6b— Linux/CUDA SPM build → SKIP. RewritesPackage.swift, bumps swift-tools-version to6.3;(experimentalCGen)(ourprovider-swiftis on 6.1), adds aCudaBuildplugin +encudatarget +swift-argument-parserdep + a 4.3k-line generated CUDA header. Zero functional value on Apple Silicon/Metal (our only target) and the single highest conflict item (collides with ourCmlxproduct + jaccl excludes and our Layr-Labs Cmlx-submodule pins). Kept out so our customizations and the toolchain floor are not silently changed.1cd3ed5— CPU-only default device → SKIP. Only changes behavior on a host with no Metal and no CUDA; on Apple Silicon the core still resolves to.gpu.bd196a9— AdamW bias correction → SKIP. Training-only; we are inference-only and don't depend onMLXOptimizers.dc43e62— nuclear norm in linalg → DEFER. No current consumer (provider never calls linalgnorm); clean to add if ever needed.Our local customizations are untouched:
Package.swift(Cmlxproduct + jaccl), Layr-Labs Cmlx-submodule pins,ParallelFileReader128 MiB batch, and the Metal resource-COUNT exposure + test.Validation
libs/mlx-swiftswift build— green.provider-swiftbuild + 1064 tests / 74 suites passed.mlx-community/gpt-oss-20b-MXFP4-Q8) exercises the inheritedcompile()return-value path throughGPTOSS.swift— PASS (TTFT 0.25 s, ~83 tok/s, coherent output; happy path not regressed by the new throw-on-error checks).evalLock/tostringpath under concurrent eval — no race/crash.Before / After
Behavior
flowchart LR subgraph Before["Before (fork @ ac67822)"] A1["concurrent batched server<br/>stringifies MLXArray off the eval thread"] --> A2["data race on the evaluator"] A3["compile() error inside withError"] --> A4["empty vector -> Index out of range<br/>UNCATCHABLE trap -> whole provider crashes"] end subgraph After["After (@ 63722be)"] B1["stringify under evalLock.withLock"] --> B2["thread-safe, no race"] B3["compile()/closure failure"] --> B4["status checked -> catchable throw<br/>request fails, process survives"] endCode
flowchart LR subgraph Before["Before"] C1["MLXArray/Device/Stream .description"] --> C2["mlx_*_tostring (no lock)"] C3["Transforms+Compile innerCall"] --> C4["ignores mlx_detail_compile /<br/>mlx_closure_apply retvals -> [] -> crash"] end subgraph After["After"] D1["MLXArray/Device/Stream .description"] --> D2["evalLock.withLock { mlx_*_tostring }"] D3["Transforms+Compile innerCall"] --> D4["guard status == 0 else return [] ;<br/>overloads return placeholder -> withError throws"] endRelated / cross-repo
Part of the "inherit upstream MLX improvements" effort, spanning three repos:
Layr-Labs/mlx-swift#7Layr-Labs/mlx-swift-lm#50d-inference superproject capstone — intentionally on hold
The capstone PR was meant to bump the d-inference submodule pins to the heads of these two branches. It is deliberately not opened yet because
d-inferenceorigin/masterhas already advanced both submodule pins past the bases these branches were cut from, via d-inference#451 ("Instrument the first-token wedge … measurement only"):libs/mlx-swift:masterpins3c50ad69= our baseac67822+ 3 EvalProbe instrumentation commits (fork PR EvalProbe: measurement-only instrumentation of the blocking eval path #6). This branch isac67822+ 2 inheritance commits → the two lines diverged atac67822.libs/mlx-swift-lm:masterpins2b4b0d8d= base8a9bc7c+ 1 EngineCore idle-clear marker (fork Add build from source doc to CONTRIBUTING.md ml-explore/mlx-swift#48).Bumping the pins straight to these branch heads would silently revert that merged measurement instrumentation. Cleanest resolution: merge these two fork PRs into their
mainbranches first (eachmainalready contains the instrumentation), then bump the d-inference pins to the newmainHEADs (which then carry both the instrumentation and this inheritance). The capstone's validation (provider build + 1064 tests + live Gemma-4-26B / GPT-OSS-20B) was run against these branch heads.