chore(provider): default-on MLX resource-count telemetry in reports (mlx-swift-lm#42) by Gajesh2007 · Pull Request #373 · Layr-Labs/d-inference

Gajesh2007 · 2026-06-16T19:45:19Z

Bumps the submodule to Layr-Labs/mlx-swift-lm#42 so the [rsrc] resource-count diagnostic is default-on and emitted via os_log under dev.darkbloom.provider.

Why

The [metal::malloc] Resource limit (499000) crash is a live Metal buffer-count climb; diagnosing it needs the resource-count trajectory. Verified against real prod log-reports (Gumbii's M3 Ultra, ids 518/520 via /v1/admin/log-reports): they contain zero [rsrc] lines because the diagnostic was off by default and print()-only (the report tool captures log show --predicate subsystem == dev.darkbloom.provider). After this, every uploaded report carries the live resource-count trace.

Coarse os_log cadence keeps reports under the 10 MB cap, but always logs at ≥70% pressure so the ramp toward 499000 is captured. Opt out: DARKBLOOM_MLX_RESOURCE_DEBUG=0. Target: 0.6.12. Merge mlx-swift-lm#42 first, then re-point if its merge SHA differs.

^{Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.}

Credits

Builds directly on @anupsv's work, credited via Co-authored-by on the bump commit:\n- Bumps the submodule onto the default-on [rsrc] telemetry he introduced in Layr-Labs/mlx-swift-lm#39, plus the negative-padding decode-mask fix that mirrors his Layr-Labs/mlx-swift-lm#40.

vercel · 2026-06-16T19:45:25Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
d-inference	Ready	Preview	Jun 16, 2026 10:12pm
d-inference-console-ui-dev	Ready	Preview	Jun 16, 2026 10:12pm
d-inference-landing	Ready	Preview	Jun 16, 2026 10:12pm

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: REQUEST_CHANGES

Security — ✅ No issues found

Performance — 4 finding(s) (3 blocking)

🔵 [INFO] provider-swift/Sources/ProviderCore/Models/ModelCatalog.swift:493 — remainingBytesToFetch helper may allocate unbounded arrays for large model catalogs
- Suggestion: Consider streaming calculation or chunked processing for very large model catalogs to avoid memory spikes
🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Models/ModelCatalog.swift:659 — fileSize() called multiple times on same paths in hot download path
- Suggestion: Cache fileSize results in local variables to avoid repeated filesystem calls
🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Models/ModelDownloadProgress.swift:65-85 — Linear search through order array on every progress update
- Suggestion: Use Dictionary for O(1) lookup or maintain index mapping to avoid O(n) search per update
🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Models/ModelDownloadProgress.swift:98-103 — compactMap rebuilds entire progress array on every render frame
- Suggestion: Cache the ordered progress array and only rebuild when order changes

Type_diligence — ✅ No issues found

Additive_complexity — 5 finding(s) (1 blocking)

🔵 [INFO] provider-swift/Sources/ProviderCore/Models/ModelCatalog.swift:15-18 — Comment references moved code but leaves no breadcrumb to new location
- Suggestion: Add specific file reference: 'Progress state lives in ModelDownloadProgress.swift'
🔵 [INFO] provider-swift/Sources/ProviderCore/Models/ModelCatalog.swift:493-500 — Added onChunk parameter with default nil creates optional complexity
- Suggestion: Consider making onChunk required or using a separate method variant
🔵 [INFO] provider-swift/Sources/ProviderCore/Models/ModelCatalog.swift:813-840 — localStagingDirName duplicates path sanitization logic from removed code
- Suggestion: Extract path sanitization to shared utility function
🔵 [INFO] provider-swift/Sources/darkbloom/StartCommand.swift:578-588 — PickerEntry struct adds resumable field that could be computed property
- Suggestion: Make resumable a computed property based on downloadedIDs and resumableIDs
🟡 [MEDIUM] provider-swift/Sources/darkbloom/StartCommand.swift:613-642 — buildPickerEntries static method on Start command mixes UI logic with data transformation
- Suggestion: Move to dedicated PickerEntry or ModelCatalog utility class

9 finding(s) total, 4 blocking. Verdict: REQUEST_CHANGES.

🤖 Automated review by Centaur · DAR-186

ethenotethan · 2026-06-16T19:45:44Z

    }

    /// Download a single manifest file into its staging destination, resuming
    /// from a `.part` file when present, verifying size + SHA-256 before


🔵 [INFO] ⚡ remainingBytesToFetch helper may allocate unbounded arrays for large model catalogs

💡 Suggestion: Consider streaming calculation or chunked processing for very large model catalogs to avoid memory spikes

📊 Score: 2×3 = 6 · Category: unbounded_allocations

ethenotethan · 2026-06-16T19:45:44Z

                file: file,
                destination: stagingDir.appendingPathComponent(relativePath, isDirectory: false),
                url: "\(r2CDNURL)/\(Self.escapeR2Path(manifest.r2Prefix))/\(Self.escapeR2Path(relativePath))"
            )


🟡 [MEDIUM] ⚡ fileSize() called multiple times on same paths in hot download path

💡 Suggestion: Cache fileSize results in local variables to avoid repeated filesystem calls

📊 Score: 3×4 = 12 · Category: repeated_work

ethenotethan · 2026-06-16T19:45:44Z

+    /// Register a file to track. `initialBytes` seeds a resumed `.part` prefix
+    /// so the bar starts where the previous run left off (and is excluded from
+    /// the speed math via `baselineBytes`).
+    func register(label: String, expectedBytes: Int64, initialBytes: Int64 = 0) {
+        lock.lock()
+        defer { lock.unlock() }
+        if progress[label] == nil { order.append(label) }
+        var p = FileProgress(label: label, expectedBytes: expectedBytes)
+        p.downloadedBytes = max(0, initialBytes)
+        p.baselineBytes = max(0, initialBytes)
+        progress[label] = p
+    }
+
+    /// Update cumulative bytes-on-disk for a file.
+    func update(label: String, downloadedBytes: Int64) {
+        lock.lock()
+        defer { lock.unlock() }
+        guard var p = progress[label] else { return }
+        p.downloadedBytes = downloadedBytes
+        progress[label] = p
+    }


🟡 [MEDIUM] ⚡ Linear search through order array on every progress update

💡 Suggestion: Use Dictionary for O(1) lookup or maintain index mapping to avoid O(n) search per update

📊 Score: 2×4 = 8 · Category: inefficient_data_structures

ethenotethan · 2026-06-16T19:45:45Z

+    /// Thread-safe snapshot of all tracked file progress, in registration order.
+    var allProgress: [FileProgress] {
+        lock.lock()
+        defer { lock.unlock() }
+        return order.compactMap { progress[$0] }
+    }


🟡 [MEDIUM] ⚡ compactMap rebuilds entire progress array on every render frame

💡 Suggestion: Cache the ordered progress array and only rebuild when order changes

📊 Score: 2×4 = 8 · Category: repeated_work

ethenotethan · 2026-06-16T19:45:45Z


 // MARK: - Download progress tracking & rendering
-
-/// Per-file progress state used by `DownloadProgressTracker`.
-private struct FileProgress: Sendable {
-    let label: String
-    let expectedBytes: Int64
-    var downloadedBytes: Int64 = 0
-    var startTime: Date = Date()
-    var completed: Bool = false
-    var completionTime: Date?
-    var destinationURL: URL?
-
-    /// Bytes/second using elapsed wall time.
-    var speed: Double {
-        let elapsed = (completionTime ?? Date()).timeIntervalSince(startTime)
-        guard elapsed > 0.1 else { return 0 }
-        return Double(downloadedBytes) / elapsed
-    }
-
-    /// Estimated seconds remaining.
-    var eta: Double? {
-        guard speed > 0, expectedBytes > 0 else { return nil }
-        let remaining = Double(expectedBytes - downloadedBytes)
-        guard remaining > 0 else { return nil }
-        return remaining / speed
-    }
-
-    var fraction: Double {
-        guard expectedBytes > 0 else { return 0 }
-        return min(1.0, Double(downloadedBytes) / Double(expectedBytes))
-    }
-}
-
-/// Delegate-based download tracker that provides incremental progress.
-///
-/// Each download task is registered with `register(taskID:label:expectedBytes:)`.
-/// The delegate callbacks update shared state that `ProgressRenderer` reads.
-/// Completed downloads are signalled via per-task continuations.
-private final class DownloadProgressTracker: NSObject, URLSessionDownloadDelegate, @unchecked Sendable {
-
-    /// Result of a single file download: the temporary location where
-    /// URLSession wrote the file (must be moved before the delegate returns).
-    struct DownloadResult {
-        let location: URL
-        let response: URLResponse
-    }
-
-    private let lock = NSLock()
-    private var progressMap: [Int: FileProgress] = [:]  // taskIdentifier -> progress
-    private var continuations: [Int: CheckedContinuation<DownloadResult, Error>] = [:]
-    private var _allProgress: [FileProgress] = []
-
-    /// Register a task so we can track its progress.
-    func register(taskID: Int, label: String, expectedBytes: Int64) {
-        lock.lock()
-        progressMap[taskID] = FileProgress(label: label, expectedBytes: expectedBytes)
-        rebuildSnapshot()
-        lock.unlock()
-    }
-
-    /// Store the continuation that will be resumed when the download finishes.
-    func setContinuation(_ cont: CheckedContinuation<DownloadResult, Error>, forTaskID taskID: Int) {
-        lock.lock()
-        continuations[taskID] = cont
-        lock.unlock()
-    }
-
-    /// Thread-safe snapshot of all tracked file progress, ordered by
-    /// registration time.
-    var allProgress: [FileProgress] {
-        lock.lock()
-        defer { lock.unlock() }
-        return _allProgress
-    }
-
-    /// Whether all registered downloads have completed (or errored).
-    var isComplete: Bool {
-        lock.lock()
-        defer { lock.unlock() }
-        return !progressMap.isEmpty && progressMap.values.allSatisfy(\.completed)
-    }
-
-    private func rebuildSnapshot() {
-        _allProgress = progressMap.keys.sorted().map { progressMap[$0]! }
-    }
-
-    // MARK: URLSessionDownloadDelegate
-
-    func urlSession(
-        _ session: URLSession,
-        downloadTask: URLSessionDownloadTask,
-        didWriteData bytesWritten: Int64,
-        totalBytesWritten: Int64,
-        totalBytesExpectedToWrite: Int64
-    ) {
-        lock.lock()
-        let id = downloadTask.taskIdentifier
-        if var p = progressMap[id] {
-            p.downloadedBytes = totalBytesWritten
-            if totalBytesExpectedToWrite > 0 {
-                // Update expected if the server tells us (e.g. after resume
-                // partial content).  Keep original manifest value if server
-                // returns -1.
-            }
-            progressMap[id] = p
-            rebuildSnapshot()
-        }
-        lock.unlock()
-    }
-
-    func urlSession(
-        _ session: URLSession,
-        downloadTask: URLSessionDownloadTask,
-        didFinishDownloadingTo location: URL
-    ) {
-        // URLSession deletes the temp file when this callback returns.
-        // Move it to a stable location so the continuation consumer can
-        // process it.
-        let stableLocation = FileManager.default.temporaryDirectory
-            .appendingPathComponent("darkbloom-dl-\(downloadTask.taskIdentifier)-\(UUID().uuidString)")
-        try? FileManager.default.moveItem(at: location, to: stableLocation)
-
-        lock.lock()
-        let id = downloadTask.taskIdentifier
-        if var p = progressMap[id] {
-            p.downloadedBytes = p.expectedBytes > 0 ? p.expectedBytes : p.downloadedBytes
-            p.completed = true
-            p.completionTime = Date()
-            p.destinationURL = stableLocation
-            progressMap[id] = p
-            rebuildSnapshot()
-        }
-        let cont = continuations.removeValue(forKey: id)
-        lock.unlock()
-
-        let response = downloadTask.response ?? HTTPURLResponse(
-            url: downloadTask.originalRequest?.url ?? URL(string: "about:blank")!,
-            statusCode: 200,
-            httpVersion: nil,
-            headerFields: nil
-        )!
-        cont?.resume(returning: DownloadResult(location: stableLocation, response: response))
-    }
-
-    func urlSession(
-        _ session: URLSession,
-        task: URLSessionTask,
-        didCompleteWithError error: (any Error)?
-    ) {
-        guard let error else { return }
-        lock.lock()
-        let id = task.taskIdentifier
-        if var p = progressMap[id] {
-            p.completed = true
-            p.completionTime = Date()
-            progressMap[id] = p
-            rebuildSnapshot()
-        }
-        let cont = continuations.removeValue(forKey: id)
-        lock.unlock()
-        cont?.resume(throwing: error)
-    }
-}
-
-/// Renders a multi-line progress display to the terminal using ANSI escape
-/// codes. Falls back to simple per-file messages when stdout is not a TTY.
-private final class ProgressRenderer: @unchecked Sendable {
-
-    private let isTTY: Bool
-    private var linesPrinted: Int = 0
-    private let lock = NSLock()
-    /// Set of labels already printed in non-TTY mode.
-    private var printedLabels: Set<String> = []
-
-    init() {
-        self.isTTY = isatty(STDOUT_FILENO) != 0
-    }
-
-    /// Render a frame given the current file progress snapshot.
-    func render(_ files: [FileProgress]) {
-        lock.lock()
-        defer { lock.unlock() }
-
-        if !isTTY {
-            renderPlain(files)
-            return
-        }
-        renderANSI(files)
-    }
-
-    /// Final render: clear the progress area and print completion summary.
-    func finish(_ files: [FileProgress]) {
-        lock.lock()
-        defer { lock.unlock() }
-
-        if isTTY {
-            // Move up and clear all lines.
-            if linesPrinted > 0 {
-                print("\u{1B}[\(linesPrinted)A", terminator: "")
-                for _ in 0..<linesPrinted {
-                    print("\u{1B}[2K")
-                }
-                print("\u{1B}[\(linesPrinted)A", terminator: "")
-                linesPrinted = 0
-            }
-        }
-
-        // Print final summary lines.
-        for f in files {
-            let totalStr = Self.formatBytes(f.expectedBytes > 0 ? f.expectedBytes : f.downloadedBytes)
-            let elapsed = (f.completionTime ?? Date()).timeIntervalSince(f.startTime)
-            let avgSpeed = elapsed > 0.1 ? Double(f.downloadedBytes) / elapsed : 0
-            let speedStr = Self.formatSpeed(avgSpeed)
-            let timeStr = Self.formatDuration(elapsed)
-            print("  \u{2713} \(f.label)  \(totalStr)  \(speedStr)  \(timeStr)")
-        }
-    }
-
-    // MARK: - ANSI rendering
-
-    private func renderANSI(_ files: [FileProgress]) {
-        // Move cursor up to overwrite previous render.
-        if linesPrinted > 0 {
-            print("\u{1B}[\(linesPrinted)A", terminator: "")
-        }
-
-        let termWidth = Self.terminalWidth()
-        var lines = 0
-        for f in files {
-            print("\u{1B}[2K", terminator: "")  // Clear the line
-            let line = Self.formatLine(f, termWidth: termWidth)
-            print(line)
-            lines += 1
-        }
-        linesPrinted = lines
-        fflush(stdout)
-    }
-
-    private func renderPlain(_ files: [FileProgress]) {
-        for f in files where f.completed && !printedLabels.contains(f.label) {
-            printedLabels.insert(f.label)
-            let totalStr = Self.formatBytes(f.expectedBytes > 0 ? f.expectedBytes : f.downloadedBytes)
-            print("  \u{2713} \(f.label)  \(totalStr)")
-        }
-    }
-
-    // MARK: - Line formatting
-
-    private static func formatLine(_ f: FileProgress, termWidth: Int) -> String {
-        if f.completed {
-            let totalStr = formatBytes(f.expectedBytes > 0 ? f.expectedBytes : f.downloadedBytes)
-            let elapsed = (f.completionTime ?? Date()).timeIntervalSince(f.startTime)
-            let avgSpeed = elapsed > 0.1 ? Double(f.downloadedBytes) / elapsed : 0
-            return "  \u{2713} \(f.label)  \(totalStr)  \(formatSpeed(avgSpeed))  done"
-        }
-
-        let pct = Int(f.fraction * 100)
-        let dlStr = formatBytes(f.downloadedBytes)
-        let totStr = formatBytes(f.expectedBytes)
-        let speedStr = formatSpeed(f.speed)
-        let etaStr: String
-        if let eta = f.eta {
-            etaStr = "ETA \(formatDuration(eta))"
-        } else {
-            etaStr = "---"
-        }
-
-        // Assemble the suffix: "  62%  2.1/4.8 GB  113 MB/s  ETA 24s"
-        let suffix = "  \(String(format: "%3d", pct))%  \(dlStr)/\(totStr)  \(speedStr)  \(etaStr)"
-
-        // Calculate bar width: total - label - prefix - suffix - brackets - spaces
-        let labelMaxWidth = min(f.label.count, 45)
-        let label = f.label.count > labelMaxWidth
-            ? String(f.label.suffix(labelMaxWidth - 1)).padding(toLength: labelMaxWidth, withPad: " ", startingAt: 0)
-            : f.label
-        let prefix = "  \(label)  ["
-        let postfix = "]\(suffix)"
-        let barWidth = max(10, termWidth - prefix.count - postfix.count)
-
-        let filled = Int(f.fraction * Double(barWidth))
-        let empty = barWidth - filled
-        let bar = String(repeating: "\u{2588}", count: filled) + String(repeating: "\u{2591}", count: empty)
-
-        return "\(prefix)\(bar)\(postfix)"
-    }
-
-    // MARK: - Formatting helpers
-
-    static func formatBytes(_ bytes: Int64) -> String {
-        let b = Double(bytes)
-        if b < 1024 { return "\(bytes) B" }
-        if b < 1_048_576 { return String(format: "%.1f KB", b / 1024) }
-        if b < 1_073_741_824 { return String(format: "%.1f MB", b / 1_048_576) }
-        return String(format: "%.1f GB", b / 1_073_741_824)
-    }
-
-    static func formatSpeed(_ bytesPerSec: Double) -> String {
-        if bytesPerSec < 1024 { return String(format: "%.0f B/s", bytesPerSec) }
-        if bytesPerSec < 1_048_576 { return String(format: "%.0f KB/s", bytesPerSec / 1024) }
-        if bytesPerSec < 1_073_741_824 { return String(format: "%.0f MB/s", bytesPerSec / 1_048_576) }
-        return String(format: "%.1f GB/s", bytesPerSec / 1_073_741_824)
-    }
-
-    static func formatDuration(_ seconds: Double) -> String {
-        let s = Int(seconds)
-        if s < 60 { return "\(s)s" }
-        if s < 3600 { return "\(s / 60)m \(s % 60)s" }
-        return "\(s / 3600)h \(s / 60 % 60)m"
-    }
-
-    static func terminalWidth() -> Int {
-        #if canImport(Darwin)
-        var w = winsize()
-        if ioctl(STDOUT_FILENO, TIOCGWINSZ, &w) == 0, w.ws_col > 0 {
-            return Int(w.ws_col)
-        }
-        #endif
-        return 80
-    }
-}
+//
+// Progress state (`FileProgress`, `ManifestDownloadProgress`) and terminal


🔵 [INFO] 🧩 Comment references moved code but leaves no breadcrumb to new location

💡 Suggestion: Add specific file reference: 'Progress state lives in ModelDownloadProgress.swift'

📊 Score: 2×3 = 6 · Category: dead code

ethenotethan · 2026-06-16T19:45:45Z

    /// from a `.part` file when present, verifying size + SHA-256 before
    /// promoting to the final staged path. Reuses the resume-capable
    /// `downloadFile` helper (Range requests, Content-Range validation, retries).
+    ///
+    /// `onChunk(bytesOnDisk)` reports cumulative bytes-on-disk for this file as
+    /// it streams, so the foreground path can render a live per-shard bar; the
+    /// background prefetch passes nil and accounts progress per whole file.
    private func downloadManifestFileWithResume(


🔵 [INFO] 🧩 Added onChunk parameter with default nil creates optional complexity

💡 Suggestion: Consider making onChunk required or using a separate method variant

📊 Score: 1×2 = 2 · Category: over-abstraction

ethenotethan · 2026-06-16T19:45:45Z

            .appendingPathComponent("local", isDirectory: true)
    }

+    /// Stable foreground-download staging dir name, keyed by the manifest's
+    /// `r2Prefix` so an interrupted download resumes into the SAME dir instead of
+    /// a throwaway UUID. `r2Prefix` is path-like (e.g. "v2/org__name/version");
+    /// flatten it to a single safe component.
+    static func localStagingDirName(r2Prefix: String) -> String {
+        ".local-staging-" + r2Prefix
+            .replacingOccurrences(of: "/", with: "__")
+            .replacingOccurrences(of: "\\", with: "__")
+    }
+
+    /// Whether an interrupted foreground download left resumable content staged on
+    /// disk for this model build (keyed by `r2Prefix`): a completed shard or a
+    /// `.part` prefix in the stable `.local-staging-…` dir. Lets the picker show
+    /// "resuming" instead of "not downloaded" for a partially-downloaded model so
+    /// re-selecting it FINISHES the download rather than appearing to start over.
+    public static func hasResumableStaging(modelID: String, r2Prefix: String) -> Bool {
+        let stagingDir = cacheModelDirectory(for: modelID)
+            .appendingPathComponent("snapshots", isDirectory: true)
+            .appendingPathComponent(localStagingDirName(r2Prefix: r2Prefix), isDirectory: true)
+        guard let entries = try? FileManager.default.contentsOfDirectory(atPath: stagingDir.path) else {
+            return false
+        }
+        // Any non-hidden staged entry (a finished file, a `.part`, or a nested
+        // subdir like `adapters/`) is resumable content worth finishing.
+        return entries.contains { !$0.hasPrefix(".") }


🔵 [INFO] 🧩 localStagingDirName duplicates path sanitization logic from removed code

💡 Suggestion: Extract path sanitization to shared utility function

📊 Score: 2×3 = 6 · Category: duplicate logic

ethenotethan · 2026-06-16T19:45:45Z

    // MARK: - Interactive Catalog Picker

    /// Entry shown in the interactive TUI model picker.
-    private struct PickerEntry {
+    ///
+    /// `downloaded` is computed from an UNFILTERED on-disk check (not the
+    /// available-memory-filtered scan) so a fully-downloaded model that exceeds
+    /// available RAM still reads "downloaded (won't fit)" rather than "not
+    /// downloaded". `resumable` flags a build whose foreground download was
+    /// interrupted (staging on disk) so the picker can show "resuming".
+    struct PickerEntry: Equatable {
        let id: String


🔵 [INFO] 🧩 PickerEntry struct adds resumable field that could be computed property

💡 Suggestion: Make resumable a computed property based on downloadedIDs and resumableIDs

📊 Score: 1×2 = 2 · Category: over-abstraction

ethenotethan · 2026-06-16T19:45:45Z

+    static func buildPickerEntries(
+        rows: [PickerCatalogRow],
+        downloadedIDs: Set<String>,
+        localMemoryByID: [String: Double],
+        resumableIDs: Set<String>,
+        memoryGb: Double
+    ) -> [PickerEntry] {
+        var entries: [PickerEntry] = rows.compactMap { row in
+            let model = row.model
+            let isDownloaded = downloadedIDs.contains(model.id)
+            if !isDownloaded, let minRam = model.minRamGb, Double(minRam) > memoryGb {
+                return nil
+            }
+            let size = isDownloaded ? (localMemoryByID[model.id] ?? model.sizeGb) : model.sizeGb
+            return PickerEntry(
+                id: model.id,
+                catalogModel: model,
+                displayName: row.displayName,
+                sizeGb: size,
+                minRamGb: model.minRamGb,
+                downloaded: isDownloaded,
+                resumable: !isDownloaded && resumableIDs.contains(model.id)
+            )
+        }
+        // Downloaded first, then larger first.
+        entries.sort { a, b in
+            if a.downloaded != b.downloaded { return a.downloaded }
+            return a.sizeGb > b.sizeGb
+        }
+        return entries


🟡 [MEDIUM] 🧩 buildPickerEntries static method on Start command mixes UI logic with data transformation

💡 Suggestion: Move to dedicated PickerEntry or ModelCatalog utility class

📊 Score: 3×4 = 12 · Category: misplaced responsibility

github-actions · 2026-06-16T19:46:24Z

No directly mapped threat-model coverage for this PR — but two files warrant threat-model expansion and one warrants a targeted security note.

Trust boundaries touched

None of the four changed files match an affected_files pattern in the current threat model. The closest boundaries are TB-003 (provider operator vs. process) and TB-007 (provider inference engine), but neither lists these paths explicitly.

Files outside current coverage

File	Observation
`libs/mlx-swift-lm`	Inference-engine library change. Currently only `provider-swift/Sources/ProviderCore/Inference/**` is listed under T-007/T-027/T-028. Changes here directly affect the GPU execution path, KV-cache lifecycle, and (for T-028) residual-data exposure.
`provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift`	New trust-boundary surface — see note below.
`provider-swift/Tests/ProviderCoreTests/BatchKVCacheTests.swift`	Test-only; no production surface.
`provider-swift/Tests/ProviderCoreTests/LaunchAgentRestartTests.swift`	Test-only; no production surface, but see LaunchAgent note.

New attack surface not covered by an existing threat

LaunchAgent.swift — persistent process re-launch mechanism

A launchd LaunchAgent plist introduces a surface not modelled anywhere in the current threat model:

TB-003 gap — operator persistence / restart-after-kill: LaunchAgents are user-writable (~/Library/LaunchAgents/). If the provider process is killed (e.g. by the coordinator detecting a bad challenge), a LaunchAgent restarts it automatically. This potentially undermines the "three consecutive challenge failures → permanently untrusted" eviction model: the process restarts, re-registers with a fresh ephemeral X25519 key, and begins a new attestation cycle. The threat model assumes process termination is a meaningful enforcement action (TB-003, T-015); a self-restarting agent weakens that assumption. Recommend adding this as a new finding under T-015 or a new T-04x entry.
Plist path and contents should be reviewed: If the plist is written to disk by the process itself (common pattern), the operator can modify it to inject environment variables or arguments before the restart — potentially bypassing startup checks. Verify the plist is either embedded in the signed bundle or written with restricted permissions.
SIP / Hardened Runtime interaction: LaunchAgents run under the user session, not as root. PT_DENY_ATTACH and Hardened Runtime still apply on restart, so the core anti-debug posture (T-014) is unaffected. This is ℹ️ neutral for T-014.

Recommended threat-model updates

Add provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift to affected_files for T-015 and T-035.
Add libs/mlx-swift-lm/** to affected_files for T-007, T-027, T-028, and T-041.
Consider a new threat entry covering LaunchAgent-mediated re-registration after forced disconnection under TB-003, specifically whether automatic restart resets the "permanently untrusted" coordinator state.

🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cd519081b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-16T19:54:05Z

+            if !isDownloaded, let minRam = model.minRamGb, Double(minRam) > memoryGb {
+                return nil


Block non-TTY picks of oversized local models

When a downloaded model exceeds this machine's RAM, this branch now keeps it in entries, but the non-TTY fallbackPicker still accepts all or the row number without the canFitIndividually guard used by the TTY picker. In the piped-input path that oversized ID can be written into the launch agent, and the foreground child later builds from the memory-filtered snapshot.models, so an oversized-only selection starts the service only to fail with No models selected; either filter/disable those rows in fallback or keep them display-only there.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-16T19:54:05Z

-                        try await self.downloadManifestFileWithProgress(
-                            job, tracker: tracker, session: delegateSession
-                        )
+                        try await self.downloadManifestFileWithResume(job, onChunk: { bytes in


Avoid byte-wise foreground shard downloads

For manifest models selected through darkbloom models download or the interactive picker, this new foreground path now routes every multi-GB shard through downloadManifestFileWithResume, whose streaming helper consumes URLSession.AsyncBytes one UInt8 at a time (for try await byte in byteStream). That replaces the previous URLSessionDownloadTask foreground path with per-byte Swift async iteration, making large model downloads CPU-bound and much slower; use a chunked/delegate-based resume path instead of iterating each byte individually.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-16T19:54:05Z

-                        try await self.downloadManifestFileWithProgress(
-                            job, tracker: tracker, session: delegateSession
-                        )
+                        try await self.downloadManifestFileWithResume(job, onChunk: { bytes in


Promote complete .part files before resuming

If a foreground download is killed after a shard is fully written to <dest>.part but before SHA verification/move (for example during hashing a multi-GB shard), restarting now enters this new resume path and sends Range: bytes=<full-size>-; a normal CDN responds 416, and streamDownload deletes the complete .part before retrying from zero. This defeats finish-on-restart and can fail on tight disks because the capacity check credited the full .part; check for a full-size/hash-valid .part and promote it before issuing a Range request.

Useful? React with 👍 / 👎.

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — 1 finding(s)

🔵 [INFO] provider-swift/Tests/ProviderCoreTests/BatchKVCacheTests.swift:329-350 — Test creates unnecessary MLXArray zeros when simple shapes would suffice
- Suggestion: Use MLXArray.zeros([2, 1, 4, 1]) pattern is fine, but consider if the test logic could be simplified to focus on the core assertion about negative padding without the full cache update cycle

1 finding(s) total, 0 blocking. Verdict: COMMENT.

🤖 Automated review by Centaur · DAR-186

ethenotethan · 2026-06-16T20:35:35Z

+    @Test("rotating decode stays unmasked after trimming past the window (negative padding)")
+    func rotatingUnmaskedAfterWindowTrim() {
+        // A rotating cache that generates past its window trims slots and
+        // subtracts the trim from leftPadding, so unpadded rows go NEGATIVE.
+        // The decode-mask fast path must treat negative padding as unpadded
+        // (<= 0), or every long (> window) Gemma 4 decode falls back to the
+        // explicit-mask path on each sliding layer (the mlx#3384 slow/divergent
+        // path the repetition fix avoids).
+        let cache = BatchRotatingKVCache(maxSize: 3, leftPadding: [0, 0])
+        _ = cache.update(  // prefill 4 > window 3 -> trims, leftPadding goes negative
+            keys: MLXArray.zeros([2, 1, 4, 1]), values: MLXArray.zeros([2, 1, 4, 1]))
+        _ = cache.update(  // one decode step
+            keys: MLXArray.zeros([2, 1, 1, 1]), values: MLXArray.zeros([2, 1, 1, 1]))
+        // Precondition that makes this a real regression test: rows are now
+        // negatively padded, so a `== 0` guard would (wrongly) mask.
+        #expect(cache.leftPadding.max().item(Int32.self) < 0)
+        let mask = cache.makeMask(n: 1, windowSize: 3, returnArray: true)
+        guard case .none = mask else {
+            Issue.record("expected .none for unpadded rotating decode past window, got \(mask)")
+            return
+        }
+    }


🔵 [INFO] 🧩 Test creates unnecessary MLXArray zeros when simple shapes would suffice

💡 Suggestion: Use MLXArray.zeros([2, 1, 4, 1]) pattern is fine, but consider if the test logic could be simplified to focus on the core assertion about negative padding without the full cache update cycle

📊 Score: 2×3 = 6 · Category: over-abstraction

chatgpt-codex-connector

💡 Codex Review

https://github.com/Layr-Labs/d-inference/blob/6953e407485553f3f56a96054c84a6a07e46af78/libs/mlx-swift-lm/Libraries/MLXLMCommon/ContinuousBatching/EngineCore.swift#L112-L117
Persist the resource-debug opt-out for launchd

When the provider is started through the normal background path (darkbloom start), this check runs in launchd's start --foreground child, but LaunchAgent.makeServicePlist only persists DARKBLOOM_PREFIX_CACHE into EnvironmentVariables. As a result, DARKBLOOM_MLX_RESOURCE_DEBUG=0 darkbloom start is dropped before the child launches, so the new default-on stdout/os_log resource trace cannot be disabled for production daemons unless the plist is hand-edited or this key is added to the launchd passthrough allowlist.

https://github.com/Layr-Labs/d-inference/blob/6953e407485553f3f56a96054c84a6a07e46af78/libs/mlx-swift-lm/Libraries/MLXLMCommon/ContinuousBatching/EngineCore.swift#L371-L372
Avoid unbounded stdout resource traces

With this default-on path, every 50 busy scheduler steps appends an [rsrc] line to stdout before the coarser os_log gating runs. In the normal launchd service, stdout/stderr are redirected to ~/.darkbloom/provider.log via StandardOutPath/StandardErrorPath, and that file has no rotation in the repo, so high-throughput providers will grow a local log indefinitely even though the unified-log report path is capped; make the stdout trace coarse/opt-in or rotate/truncate the legacy log.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…k fix Bumps mlx-swift-lm to 1e7b5ac (Layr-Labs/mlx-swift-lm#42): (1) default-on [rsrc] resource-count telemetry via os_log so reports carry the trajectory; (2) decode-mask fast path now treats negative leftPadding as unpadded (<= 0), fixing a regression where long (> sliding-window) Gemma 4 generations fell back to the explicit-mask path on every sliding layer. Adds a provider regression test exercising the negative-padding rotating-cache case. Target 0.6.12. Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com>

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — 1 finding(s)

🔵 [INFO] provider-swift/Tests/ProviderCoreTests/BatchKVCacheTests.swift:329-350 — Test method has verbose name and complex setup for a simple negative padding check
- Suggestion: Simplify test name to 'testNegativePaddingUnmasked' and reduce setup complexity - the core assertion is just checking leftPadding < 0 and mask == .none

1 finding(s) total, 0 blocking. Verdict: COMMENT.

🤖 Automated review by Centaur · DAR-186

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

Gajesh2007 · 2026-06-16T21:26:31Z

Addressed both Codex P2s on the default-on telemetry (a405ac8f + submodule bump to 8518e6d):

"Wire the telemetry opt-out into the LaunchAgent" — DARKBLOOM_MLX_RESOURCE_DEBUG is now forwarded via LaunchAgent.passthroughEnvKeys, so the documented opt-out actually reaches the launchd service. Added a passthrough unit test.
"Gate stdout resource samples in daemon mode" — bumped libs/mlx-swift-lm to 8518e6d, which TTY-gates the [rsrc] stdout print (no more unbounded provider.log growth). os_log unchanged.

blacksmith-sh · 2026-06-16T21:48:32Z

Found 1 test failure on Blacksmith runners:

Failure

Test	View Logs
`github.com/eigeninference/d-inference/e2e/TestProfile_SingleProviderNonStreaming`	View Logs

^{Need help on this PR? Tag /codesmith with what you need.}

Bump libs/mlx-swift-lm to 288bcba (rebased onto main; drops the duplicate #41 commits and adds the post-update window-bound fix for the decode mask fast path). Add a regression test: when windowSize < maxCacheSize, makeMask must keep the windowed mask at the boundary (.none would let the new token attend one position past the window), while staying .none below the boundary. The existing Gemma (window == maxCacheSize) negative-padding test still passes unchanged. Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com>

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

#42 (default-on resource telemetry + Gemma 4 decode-mask fixes) is merged to mlx-swift-lm main. Move the submodule pointer from the feature-branch commit to the squashed main commit e7af9df (identical content). The parent-side BatchKVCache regression tests added on this branch stay.

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

chatgpt-codex-connector

💡 Codex Review

https://github.com/Layr-Labs/d-inference/blob/4cc4973c940c0f79a9626b24d960019f19cdc0b1/libs/mlx-swift-lm/Libraries/MLXLMCommon/BatchKVCache.swift#L408-L410
Preserve window masks in BatchKVCache fast path

When windowSize is non-nil and smaller than the accumulated BatchKVCache length, this new single-token fast path drops the mask entirely for unpadded rows. A caller that asks for a sliding window after the cache already holds more than that window will now let the query attend to every retained key instead of only the active window; the rotating cache added a post-update window guard, but this regular batch cache needs the same check or should keep returning an array mask when windowSize is active.

https://github.com/Layr-Labs/d-inference/blob/4cc4973c940c0f79a9626b24d960019f19cdc0b1/libs/mlx-swift-lm/Libraries/MLXLMCommon/ContinuousBatching/EngineCore.swift#L404-L405
Throttle high-pressure os_log samples

When a provider stays above 70% resource pressure without immediately crashing, this branch logs [rsrc] to unified logging every 10 busy scheduler steps. Those are exactly the logs collected by darkbloom report, whose manual path rejects reports over 10 MB and whose auto-report path truncates to 10 MB, so a sustained high-pressure run can make the diagnostic report fail or lose the newest crash ramp; keep the dense sampling bounded to a short transition window or sample it at the report cadence after the first few high-pressure points.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Memory & reliability hardening: - 90% unified-memory cap + KV purge on unload + serve-while-load reservation (#363) - bounded checkpoint-capture pipeline, stops the Metal 499000 resource leak (#374) - resumable 4-bit model downloads + restart-safe detection (#372) - default-on [rsrc] resource telemetry in reports (#373 / mlx-swift-lm#42) - don't log raw request-parse errors — prompt-fragment privacy (#376) Coordinator LatestProviderVersion bump is intentionally deferred until the signed 0.6.12 bundle is published (prod uses the in-memory store, so the hardcoded fallback is authoritative after a redeploy).

Gajesh2007 requested a deployment to benchmarks June 16, 2026 19:45 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 16, 2026 19:45 View deployment

ethenotethan requested changes Jun 16, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference June 16, 2026 19:45 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 16, 2026 19:46 View deployment

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Gajesh2007 force-pushed the chore/bump-mlx-rsrc-debug branch from 0cd5190 to d2b2eb7 Compare June 16, 2026 20:02

Gajesh2007 requested a deployment to benchmarks June 16, 2026 20:02 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 16, 2026 20:02 View deployment

ethenotethan reviewed Jun 16, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference June 16, 2026 20:03 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 16, 2026 20:03 View deployment

Gajesh2007 mentioned this pull request Jun 16, 2026

fix(provider): don't log raw request-parse errors (avoid prompt-fragment leak in reports) #376

Merged

Gajesh2007 force-pushed the chore/bump-mlx-rsrc-debug branch from d2b2eb7 to 6953e40 Compare June 16, 2026 20:35

Gajesh2007 requested a deployment to benchmarks June 16, 2026 20:35 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 16, 2026 20:35 View deployment

ethenotethan reviewed Jun 16, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference-console-ui-dev June 16, 2026 20:36 View deployment

vercel Bot deployed to Preview – d-inference June 16, 2026 20:36 View deployment

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Gajesh2007 force-pushed the chore/bump-mlx-rsrc-debug branch from 6953e40 to 6fea8b9 Compare June 16, 2026 21:05

Gajesh2007 requested a deployment to benchmarks June 16, 2026 21:05 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 16, 2026 21:05 View deployment

ethenotethan reviewed Jun 16, 2026

View reviewed changes

Comment thread provider-swift/Tests/ProviderCoreTests/BatchKVCacheTests.swift

Gajesh2007 mentioned this pull request Jun 16, 2026

feat(telemetry)+fix(gemma4): default-on resource logging + negative-padding decode-mask fix Layr-Labs/mlx-swift-lm#42

Merged

vercel Bot deployed to Preview – d-inference June 16, 2026 21:05 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 16, 2026 21:05 View deployment

Gajesh2007 requested a deployment to benchmarks June 16, 2026 21:26 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 16, 2026 21:26 View deployment

ethenotethan reviewed Jun 16, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference June 16, 2026 21:26 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 16, 2026 21:26 View deployment

Gajesh2007 requested a deployment to benchmarks June 16, 2026 21:49 — with GitHub Actions Waiting

ethenotethan reviewed Jun 16, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference-landing June 16, 2026 21:49 View deployment

vercel Bot deployed to Preview – d-inference June 16, 2026 21:50 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 16, 2026 21:50 View deployment

Gajesh2007 requested a deployment to benchmarks June 16, 2026 22:02 — with GitHub Actions Waiting

ethenotethan reviewed Jun 16, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference-landing June 16, 2026 22:02 View deployment

vercel Bot deployed to Preview – d-inference June 16, 2026 22:03 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 16, 2026 22:03 View deployment

Merge branch 'master' into chore/bump-mlx-rsrc-debug

4cc4973

Gajesh2007 requested a deployment to benchmarks June 16, 2026 22:12 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 16, 2026 22:12 View deployment

Gajesh2007 merged commit c20a8eb into master Jun 16, 2026
9 of 14 checks passed

ethenotethan reviewed Jun 16, 2026

View reviewed changes

Gajesh2007 deleted the chore/bump-mlx-rsrc-debug branch June 16, 2026 22:12

vercel Bot deployed to Preview – d-inference June 16, 2026 22:12 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 16, 2026 22:12 View deployment

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Gajesh2007 mentioned this pull request Jun 16, 2026

Enforce a 90% unified-memory cap + purge KV on unload #363

Merged

		if !isDownloaded, let minRam = model.minRamGb, Double(minRam) > memoryGb {
		return nil

Uh oh!

Conversation

Gajesh2007 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Credits

Uh oh!

vercel Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — 4 finding(s) (3 blocking)

Type_diligence — ✅ No issues found

Additive_complexity — 5 finding(s) (1 blocking)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Trust boundaries touched

Files outside current coverage

New attack surface not covered by an existing threat

Recommended threat-model updates

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — 1 finding(s)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Gajesh2007 commented Jun 16, 2026 •

edited

Loading

vercel Bot commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading

blacksmith-sh Bot commented Jun 16, 2026 •

edited

Loading