Add MLXFoundationModels: an MLX-backed FoundationModels LanguageModel by ctymoszek · Pull Request #334 · ml-explore/mlx-swift-lm

ctymoszek · 2026-06-09T01:59:38Z

MLXLanguageModel conforms to FoundationModels.LanguageModel, so locally-run MLX models can be used through the FoundationModels framework. It supports chat, tool calling, and guided generation (JSON-schema-constrained sampling via a vendored copy of xgrammar).

The adapter is gated behind two default-on SwiftPM traits (FoundationModelsIntegration, GuidedGenerationSupport) and is @available(iOS 27.0, macOS 27.0, visionOS 27.0). Platform floors are unchanged from upstream (.macOS(.v14), .iOS(.v17), .tvOS(.v17), .visionOS(.v1)), so existing consumers and anyone who disables the traits are unaffected.

@available

MLXLanguageModel conforms to FoundationModels.LanguageModel, so locally-run MLX models can be used through the FoundationModels framework. It supports chat, tool calling, and guided generation (JSON-schema-constrained sampling via a vendored copy of xgrammar). The adapter is gated behind two default-on SwiftPM traits (FoundationModelsIntegration, GuidedGenerationSupport) and is @available(iOS 27.0, macOS 27.0, visionOS 27.0). Platform floors are unchanged from upstream (.macOS(.v14), .iOS(.v17), .tvOS(.v17), .visionOS(.v1)), so existing consumers and anyone who disables the traits are unaffected.

davidkoski · 2026-06-09T15:44:38Z

It would be great to group these into one or more directories for organization purposes.

davidkoski · 2026-06-09T15:48:48Z

Not really for this file, but anchoring it here in the GuidedGeneration directory. I think it would be nice if we had GuidedGeneration and FoundationModels as two separate libraries. FM could depend on GuidedGeneration.

I think most of the GuidedGeneration code doesn't need to be in MLXLMCommon -- e.g. the Embedders do not need it. That isn't to say it can't go there, but it may make more sense in GuidedGeneration if that is its own library.

A split like this would allow people to opt-in to GuidedGeneration without using any of the FM code. For example on Linux the FM code won't be available at all but the GuidedGeneration piece would be interesting.

davidkoski · 2026-06-09T15:54:11Z

+                        mask.needsApply
+                        ? UnsafeRawPointer(buffer.baseAddress!).assumingMemoryBound(to: UInt32.self)
+                        : nil
+                    return applyMaskAndSample(


This reads from logits (via item()) before the asyncEvals below are executed. This may cause a stall where we wait for the current graph to finish before scheduling any new work. Consider the normal eval loop:

// save current value -- this will be returned let previousY = y // compute the next state and async eval the next token let token = step(previous: previousY) y = .init(tokens: token) asyncEval(token) tokenCount += 1 return previousY.tokens.item(Int.self)

tokens is still an MLXArray (lazy) at the point where asyncEval is called.

davidkoski · 2026-06-09T15:56:43Z

+                    // Wait for GPU to finish (may already be done)
+                    eval(logits)
+                }
+            }


The normal eval loop has this:

// Apply dynamic cache quantization after each step maybeQuantizeKVCache( cache: &cache, kvBits: kvBits, kvGroupSize: kvGroupSize, quantizedKVStart: quantizedKVStart )

Do we need it here inside the generate loop? Not every model uses it, but I think it is important for memory use in models that do.

davidkoski · 2026-06-09T15:57:49Z

+                let maskArray = bitmaskToMLXArray(
+                    maskPtr, maskBitCount: vocabSize, totalCount: logitDim)


I wonder if this can be cached in the eval loop and passed in?

davidkoski · 2026-06-09T20:37:45Z

+#if FoundationModelsIntegration
+    #if canImport(FoundationModels, _version: 2)
+
+        import Foundation


This indentation isn't my favorite. What do you think about doing this in .swift-format?

diff --git a/.swift-format b/.swift-format index 8892e9f..b9a726a 100644 --- a/.swift-format +++ b/.swift-format @@ -4,4 +4,5 @@ "spaces": 4 }, "spacesAroundRangeFormationOperators": true, + "indentConditionalCompilationBlocks": false, }

I'm in favor, that would look nicer - but maybe that change should be separate from this PR since it affects one other file (WiredMemoryPolicies.swift)

davidkoski · 2026-06-09T20:56:30Z

+                                    let (whitespaceBias, whitespaceTokenIDs) =
+                                        WhitespaceTokenBias.compute(
+                                            tokenizer: context.tokenizer
+                                        )


This is computed per response -- should it be held in the ModelCache?

I think WhitespaceTokenBias and also friends.

davidkoski · 2026-06-09T21:00:08Z

+        /// Prevents race conditions when multiple concurrent requests try to load the model.
+        /// Supports caching multiple models by their identifiers.
+        private actor ModelCache {
+            private var containers: [String: ModelContainer] = [:]


A few of these can grow without bound, e.g. in a long running model server. evictAll() / evictAllModels() can remove them but that removes everything. I wonder if this needs finer grained eviction support?

davidkoski · 2026-06-09T21:00:49Z

+            /// Evicts all cached models, tokenizers, and constraint templates.
+            /// Frees GPU memory held by model weights. Subsequent requests will
+            /// reload models from disk cache.
+            static func evictAllModels() async {


Should this be public?

I guess similar question for each of these static functions -- should they be public?

davidkoski · 2026-06-09T21:03:37Z

+///
+/// Thread safety: marked `@unchecked Sendable` because all access is serialized
+/// through `ModelContainer.perform`.
+public struct CompositeLogitProcessor: LogitProcessor, @unchecked Sendable {


Should this be @unchecked Sendable? LogitProcessor is not Sendable and the processors can be stateful -- I don't think this is correct.

…ersion: 2) The test target referenced adapter-only symbols (MLXLanguageModel, TranscriptConverter, SchemaConverter, DevelopmentCustomizer, LoadedModelContext, FinalAnswerTool, ...) behind only the FoundationModelsIntegration trait -- or, in a few files, behind no gate at all -- while the source defines those symbols behind both the trait and canImport(FoundationModels, _version: 2). On SDKs predating the v2 FoundationModels (e.g. the macOS 26 SDK) the adapter is correctly compiled out, but the tests still referenced it, failing the build. Mirror the source's gate in the tests so the test surface tracks the adapter surface exactly. Also corrects two files gated on the wrong axis: StopTokenRegressionTests and ToolCallingSchemaTests were under GuidedGenerationSupport, but their DevelopmentCustomizer / SchemaConverter dependencies require the FM + SDK gate.

The two probe bodies that touch the v2-only adapter surface (LanguageModelCapabilities, any LanguageModel, MLXLanguageModel) were behind a bare `canImport(FoundationModels)`, which is true on the 26 SDK (FM 1.x) where those symbols don't yet exist — it would only compile because the target is built against the 27 SDK. Gate them on `canImport(FoundationModels, _version: 2)` so the condition matches the v2 boundary the symbols actually require. The top-of-file `import FoundationModels` stays bare on purpose: it gates on FM *presence* (26+), which is the correct boundary for the absent-tier probe to still compile where FoundationModels does not exist at all.

AvailabilityTests, MLXLanguageModelTests, and TranscriptConverterTests had no top-level #if before; wrapping their bodies in the `FoundationModelsIntegration && canImport(FoundationModels, _version: 2)` gate left the contents at column 0. swift-format's indentConditionalCompilationBlocks (default true) requires content inside a #if to be indented, so the CI lint step reflows them. Indent (and line-wrap the few lines that then exceed 100 cols) to match the #if-indentation style used throughout the source.

MLXFoundationModels is gated on the FoundationModels v2 SDK (canImport(FoundationModels, _version: 2)); its DocC catalog references adapter symbols that don't exist on the SDK the CI runner builds against, so `generate-documentation --warnings-as-errors` fails. Disable the step (if: false) to unblock CI. Re-enable before merge once doc generation builds against the FoundationModels SDK, or verify-docs.sh skips the FM target when v2 is absent.

davidkoski self-requested a review June 9, 2026 02:06

davidkoski reviewed Jun 9, 2026

View reviewed changes

Cori Park added 4 commits June 11, 2026 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLXFoundationModels: an MLX-backed FoundationModels LanguageModel#334

Add MLXFoundationModels: an MLX-backed FoundationModels LanguageModel#334
ctymoszek wants to merge 5 commits into
ml-explore:mainfrom
ctymoszek:mlx-foundationmodels

ctymoszek commented Jun 9, 2026

Uh oh!

davidkoski Jun 9, 2026

Uh oh!

davidkoski Jun 9, 2026

Uh oh!

davidkoski Jun 9, 2026

Uh oh!

davidkoski Jun 9, 2026

Uh oh!

davidkoski Jun 9, 2026

Uh oh!

davidkoski Jun 9, 2026

Uh oh!

ctymoszek Jun 11, 2026

Uh oh!

davidkoski Jun 9, 2026 •

edited

Loading

Uh oh!

davidkoski Jun 9, 2026

Uh oh!

davidkoski Jun 9, 2026 •

edited

Loading

Uh oh!

davidkoski Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let maskArray = bitmaskToMLXArray(
		maskPtr, maskBitCount: vocabSize, totalCount: logitDim)

Conversation

ctymoszek commented Jun 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davidkoski Jun 9, 2026 •

edited

Loading

davidkoski Jun 9, 2026 •

edited

Loading