Skip to content

chore(libs): inherit upstream MLX — bump mlx-swift / mlx-swift-lm pins to combined inherited+instrumented heads#459

Open
Gajesh2007 wants to merge 1 commit into
masterfrom
feat/inherit-upstream-mlx-2026-06
Open

chore(libs): inherit upstream MLX — bump mlx-swift / mlx-swift-lm pins to combined inherited+instrumented heads#459
Gajesh2007 wants to merge 1 commit into
masterfrom
feat/inherit-upstream-mlx-2026-06

Conversation

@Gajesh2007

@Gajesh2007 Gajesh2007 commented Jun 24, 2026

Copy link
Copy Markdown
Member

Summary

Capstone of the "inherit upstream MLX" effort. Bumps the two d-inference submodule pins to the combined heads of the fork inheritance PRs, which now also carry the measurement-only instrumentation that landed on master via d-inference#451:

submodule old pin (master) new pin (this PR) fork PR
libs/mlx-swift 3c50ad69 e20ea3dd Layr-Labs/mlx-swift#7
libs/mlx-swift-lm 2b4b0d8d 48313a08 Layr-Labs/mlx-swift-lm#50

Why this isn't a naive pin bump

master already advanced both pins past the bases the inheritance branches were cut from (via #451's EvalProbe / EngineCore instrumentation), so the two histories diverged:

  • libs/mlx-swift: master = ac67822 + 3 EvalProbe commits (3c50ad69); branch = ac67822 + 2 inheritance commits.
  • libs/mlx-swift-lm: master = 8a9bc7c + 1 EngineCore idle-clear marker (2b4b0d8d); branch = 8a9bc7c + 29 inheritance commits.

A straight bump to the old branch heads would have reverted the instrumentation. Instead, each inheritance branch merged its fork's main (which carries the instrumentation), producing a head that is a strict superset of the master pin:

git -C libs/mlx-swift     merge-base --is-ancestor 3c50ad69 e20ea3dd   # exit 0
git -C libs/mlx-swift-lm  merge-base --is-ancestor 2b4b0d8d 48313a08   # exit 0

Both checks pass → no instrumentation is reverted. This is a clean forward move that layers the inherited upstream fixes on top of everything already on master.

The merges were conflict-free:

  • mlx-swift MLXArray.swift auto-merged — EvalProbe brackets eval() (theirs); the inheritance fix wraps description/tostring in evalLock (ours) — disjoint regions.
  • mlx-swift-lm only EngineCore.swift changed (theirs); none of the 29 inheritance commits touch it. All crown jewels preserved (continuous batching, DAR-325 KV fix, KV-quant, MTP, batched Gemma4, fast-follow fp32 gated-delta dedupe).

Re-validation (against the combined tree)

  • swift build clean for mlx-swift, mlx-swift-lm, and provider-swift.
  • provider-swift swift test: 1064 tests / 74 suites passed, 0 failures (9 live-MLX tests are env-gated and self-skip).
  • Live inference (M4 Max, weights cached):
    • GPT-OSS 20B (mlx-community/gpt-oss-20b-MXFP4-Q8, compile() path): coherent — Average speed = 60 mi ÷ 1.5 h = **40 mph**, reasoning_tokens=78 / completion=114.
    • Gemma 4 26B 8bit (mlx-community/gemma-4-26b-a4b-it-8bit): batched B=2 vs single-stream parity, arithmetic 7*8 = 56, and a clean multi-turn tool call run_terminal(command: cat hello.txt).
    • Gemma 4 26B qat-4bit VLM (mlx-community/gemma-4-26B-A4B-it-qat-4bit): mixed-length batched decode with no degenerate repetition — coherent.
    • No crash / NaN on any path.

Before / After

flowchart TB
  subgraph Before["BEFORE - master @ 80ce2574"]
    direction TB
    M0["d-inference master"] -->|gitlink| S0a["libs/mlx-swift @ 3c50ad69<br/>(ac67822 + EvalProbe x3, #451)"]
    M0 -->|gitlink| S0b["libs/mlx-swift-lm @ 2b4b0d8d<br/>(8a9bc7c + EngineCore marker, #451)"]
    S0a --> B0["provider-swift builds/serves<br/>instrumentation ONLY<br/>(no inherited upstream fixes)"]
    S0b --> B0
  end

  subgraph After["AFTER - this PR @ 772ff499"]
    direction TB
    M1["d-inference master + 1 commit"] -->|gitlink| S1a["libs/mlx-swift @ e20ea3dd<br/>= merge(branch #7 + main)<br/>EvalProbe AND evalLock/compile fixes"]
    M1 -->|gitlink| S1b["libs/mlx-swift-lm @ 48313a08<br/>= merge(branch #50 + main)<br/>EngineCore marker AND 29 inherited fixes"]
    S1a --> B1["provider-swift builds/serves<br/>instrumentation AND inherited fixes<br/>1064/74 green - live GPT-OSS + Gemma4 coherent"]
    S1b --> B1
  end

  Before -.->|"2 gitlink moves only<br/>strict superset - nothing reverted"| After
Loading

What this PR changes

Exactly two gitlink updates (160000 mode) — no source changes:

 libs/mlx-swift    | 2 +-
 libs/mlx-swift-lm | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Merge ordering

This superproject pin depends on the two fork PRs. Land mlx-swift#7 and mlx-swift-lm#50 first (in a way that keeps e20ea3dd / 48313a08 reachable on each fork's main). If either fork PR squash-merges to a new SHA, re-point the corresponding gitlink here before merging this PR.


Three-PR set: this PR + mlx-swift#7 + mlx-swift-lm#50.


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

…nted heads

Adopts the combined heads of the two fork inheritance PRs, which now also
carry the d-inference#451 measurement-only instrumentation (each branch
merged its fork's `main`):

  libs/mlx-swift     3c50ad69 -> e20ea3dd  (Layr-Labs/mlx-swift#7)
  libs/mlx-swift-lm  2b4b0d8d -> 48313a08  (Layr-Labs/mlx-swift-lm#50)

Both moves are a clean forward: the old master pin is a strict ancestor of
the new head (`git merge-base --is-ancestor` passes), so no EvalProbe /
EngineCore instrumentation is reverted. Re-validated: provider-swift builds
clean, 1064 tests / 74 suites green, live GPT-OSS-20B + Gemma-4-26B
(batched, VLM, tool-calling) produce coherent output.

Co-authored-by: Cursor <cursoragent@cursor.com>
@vercel

vercel Bot commented Jun 24, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
d-inference Ready Ready Preview Jun 24, 2026 12:54am
d-inference-console-ui-dev Ready Ready Preview Jun 24, 2026 12:54am
d-inference-landing Ready Ready Preview Jun 24, 2026 12:54am

Request Review

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

@github-actions

Copy link
Copy Markdown

No threat-model-covered files were changed; however, the updated submodules touch the innermost trust boundary (TB-007) and warrant a brief inspection note.


Trust boundaries touched

  • TB-007 (Provider Inference Engine) — libs/mlx-swift and libs/mlx-swift-lm are the Metal/MLX compute layer that BatchScheduler and LocalMLXModelFoundation call directly. Neither submodule path appears in any affected_files glob in the current threat model.

Threat relevance

Neither file is listed in the threat model, so no T-xxx finding changes state. That said:

Threat Relevance
T-028 (Residual inference data in GPU Metal buffers) Any change to buffer allocation, KV-cache layout, or weight tensor lifecycle in mlx-swift/mlx-swift-lm directly affects whether prompt residue persists in GPU memory between tenants. The threat model notes this is already open with no Metal-level memset in place.
T-027 / T-007 (Weight hash / model output tampering) Changes to mlx-swift-lm model-loading or tokeniser code affect what actually runs at inference time, downstream of WeightHasher's startup-time check.
T-041 (Cross-tenant prefix-cache TTFT oracle) KV-cache shape or reuse changes in mlx-swift-lm could alter the timing signal that makes the TTFT oracle exploitable.

New attack surface not covered by an existing threat

The submodule diff itself is not included here, so the following flags are conditional on what the bump actually changes:

  1. New Metal kernels or buffer-reuse strategies — if mlx-swift introduces new allocation pools or explicit buffer reuse across forward passes, this widens the open finding for T-028 (GPU residue) and should be noted in the threat model under TB-007 affected_files.
  2. Tokeniser or sampling code changes in mlx-swift-lm — speculative-decode helpers, draft models, or vocabulary expansion can change the memory layout of in-flight tensors. If they introduce shared state across concurrent batch slots, cross-request data leakage risk increases beyond what the current BatchScheduler actor isolation assumes.
  3. Native C/C++/Metal extensions — any new FFI surface in these libraries bypasses Swift ARC and the secureZero coverage that SecurityHardening.swift provides. The threat model currently has no coverage for native extensions in the MLX stack.

Recommendation

Add libs/mlx-swift and libs/mlx-swift-lm (or the resolved submodule paths under provider-swift/) to the affected_files lists for T-007, T-027, T-028, and T-041 in the threat model. The submodule bump should include a brief changelog note or pointer to the upstream diff so reviewers can verify whether buffer-lifecycle or tokeniser behaviour changed.


🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants