Skip to content

chore(provider): default-on MLX resource-count telemetry in reports (mlx-swift-lm#42)#373

Merged
Gajesh2007 merged 5 commits into
masterfrom
chore/bump-mlx-rsrc-debug
Jun 16, 2026
Merged

chore(provider): default-on MLX resource-count telemetry in reports (mlx-swift-lm#42)#373
Gajesh2007 merged 5 commits into
masterfrom
chore/bump-mlx-rsrc-debug

Conversation

@Gajesh2007

@Gajesh2007 Gajesh2007 commented Jun 16, 2026

Copy link
Copy Markdown
Member

Bumps the submodule to Layr-Labs/mlx-swift-lm#42 so the [rsrc] resource-count diagnostic is default-on and emitted via os_log under dev.darkbloom.provider.

Why

The [metal::malloc] Resource limit (499000) crash is a live Metal buffer-count climb; diagnosing it needs the resource-count trajectory. Verified against real prod log-reports (Gumbii's M3 Ultra, ids 518/520 via /v1/admin/log-reports): they contain zero [rsrc] lines because the diagnostic was off by default and print()-only (the report tool captures log show --predicate subsystem == dev.darkbloom.provider). After this, every uploaded report carries the live resource-count trace.

Coarse os_log cadence keeps reports under the 10 MB cap, but always logs at ≥70% pressure so the ramp toward 499000 is captured. Opt out: DARKBLOOM_MLX_RESOURCE_DEBUG=0. Target: 0.6.12. Merge mlx-swift-lm#42 first, then re-point if its merge SHA differs.


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

Credits

Builds directly on @anupsv's work, credited via Co-authored-by on the bump commit:\n- Bumps the submodule onto the default-on [rsrc] telemetry he introduced in Layr-Labs/mlx-swift-lm#39, plus the negative-padding decode-mask fix that mirrors his Layr-Labs/mlx-swift-lm#40.

@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
d-inference Ready Ready Preview Jun 16, 2026 10:12pm
d-inference-console-ui-dev Ready Ready Preview Jun 16, 2026 10:12pm
d-inference-landing Ready Ready Preview Jun 16, 2026 10:12pm

Request Review

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: REQUEST_CHANGES

Security — ✅ No issues found

Performance — 4 finding(s) (3 blocking)

  • 🔵 [INFO] provider-swift/Sources/ProviderCore/Models/ModelCatalog.swift:493 — remainingBytesToFetch helper may allocate unbounded arrays for large model catalogs
    • Suggestion: Consider streaming calculation or chunked processing for very large model catalogs to avoid memory spikes
  • 🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Models/ModelCatalog.swift:659 — fileSize() called multiple times on same paths in hot download path
    • Suggestion: Cache fileSize results in local variables to avoid repeated filesystem calls
  • 🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Models/ModelDownloadProgress.swift:65-85 — Linear search through order array on every progress update
    • Suggestion: Use Dictionary for O(1) lookup or maintain index mapping to avoid O(n) search per update
  • 🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Models/ModelDownloadProgress.swift:98-103 — compactMap rebuilds entire progress array on every render frame
    • Suggestion: Cache the ordered progress array and only rebuild when order changes

Type_diligence — ✅ No issues found

Additive_complexity — 5 finding(s) (1 blocking)

  • 🔵 [INFO] provider-swift/Sources/ProviderCore/Models/ModelCatalog.swift:15-18 — Comment references moved code but leaves no breadcrumb to new location
    • Suggestion: Add specific file reference: 'Progress state lives in ModelDownloadProgress.swift'
  • 🔵 [INFO] provider-swift/Sources/ProviderCore/Models/ModelCatalog.swift:493-500 — Added onChunk parameter with default nil creates optional complexity
    • Suggestion: Consider making onChunk required or using a separate method variant
  • 🔵 [INFO] provider-swift/Sources/ProviderCore/Models/ModelCatalog.swift:813-840 — localStagingDirName duplicates path sanitization logic from removed code
    • Suggestion: Extract path sanitization to shared utility function
  • 🔵 [INFO] provider-swift/Sources/darkbloom/StartCommand.swift:578-588 — PickerEntry struct adds resumable field that could be computed property
    • Suggestion: Make resumable a computed property based on downloadedIDs and resumableIDs
  • 🟡 [MEDIUM] provider-swift/Sources/darkbloom/StartCommand.swift:613-642 — buildPickerEntries static method on Start command mixes UI logic with data transformation
    • Suggestion: Move to dedicated PickerEntry or ModelCatalog utility class

9 finding(s) total, 4 blocking. Verdict: REQUEST_CHANGES.

🤖 Automated review by Centaur · DAR-186

}

/// Download a single manifest file into its staging destination, resuming
/// from a `.part` file when present, verifying size + SHA-256 before

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] ⚡ remainingBytesToFetch helper may allocate unbounded arrays for large model catalogs

💡 Suggestion: Consider streaming calculation or chunked processing for very large model catalogs to avoid memory spikes

📊 Score: 2×3 = 6 · Category: unbounded_allocations

file: file,
destination: stagingDir.appendingPathComponent(relativePath, isDirectory: false),
url: "\(r2CDNURL)/\(Self.escapeR2Path(manifest.r2Prefix))/\(Self.escapeR2Path(relativePath))"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] ⚡ fileSize() called multiple times on same paths in hot download path

💡 Suggestion: Cache fileSize results in local variables to avoid repeated filesystem calls

📊 Score: 3×4 = 12 · Category: repeated_work

Comment on lines +65 to +85
/// Register a file to track. `initialBytes` seeds a resumed `.part` prefix
/// so the bar starts where the previous run left off (and is excluded from
/// the speed math via `baselineBytes`).
func register(label: String, expectedBytes: Int64, initialBytes: Int64 = 0) {
lock.lock()
defer { lock.unlock() }
if progress[label] == nil { order.append(label) }
var p = FileProgress(label: label, expectedBytes: expectedBytes)
p.downloadedBytes = max(0, initialBytes)
p.baselineBytes = max(0, initialBytes)
progress[label] = p
}

/// Update cumulative bytes-on-disk for a file.
func update(label: String, downloadedBytes: Int64) {
lock.lock()
defer { lock.unlock() }
guard var p = progress[label] else { return }
p.downloadedBytes = downloadedBytes
progress[label] = p
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] ⚡ Linear search through order array on every progress update

💡 Suggestion: Use Dictionary for O(1) lookup or maintain index mapping to avoid O(n) search per update

📊 Score: 2×4 = 8 · Category: inefficient_data_structures

Comment on lines +98 to +103
/// Thread-safe snapshot of all tracked file progress, in registration order.
var allProgress: [FileProgress] {
lock.lock()
defer { lock.unlock() }
return order.compactMap { progress[$0] }
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] ⚡ compactMap rebuilds entire progress array on every render frame

💡 Suggestion: Cache the ordered progress array and only rebuild when order changes

📊 Score: 2×4 = 8 · Category: repeated_work

Comment on lines +15 to +18

// MARK: - Download progress tracking & rendering

/// Per-file progress state used by `DownloadProgressTracker`.
private struct FileProgress: Sendable {
let label: String
let expectedBytes: Int64
var downloadedBytes: Int64 = 0
var startTime: Date = Date()
var completed: Bool = false
var completionTime: Date?
var destinationURL: URL?

/// Bytes/second using elapsed wall time.
var speed: Double {
let elapsed = (completionTime ?? Date()).timeIntervalSince(startTime)
guard elapsed > 0.1 else { return 0 }
return Double(downloadedBytes) / elapsed
}

/// Estimated seconds remaining.
var eta: Double? {
guard speed > 0, expectedBytes > 0 else { return nil }
let remaining = Double(expectedBytes - downloadedBytes)
guard remaining > 0 else { return nil }
return remaining / speed
}

var fraction: Double {
guard expectedBytes > 0 else { return 0 }
return min(1.0, Double(downloadedBytes) / Double(expectedBytes))
}
}

/// Delegate-based download tracker that provides incremental progress.
///
/// Each download task is registered with `register(taskID:label:expectedBytes:)`.
/// The delegate callbacks update shared state that `ProgressRenderer` reads.
/// Completed downloads are signalled via per-task continuations.
private final class DownloadProgressTracker: NSObject, URLSessionDownloadDelegate, @unchecked Sendable {

/// Result of a single file download: the temporary location where
/// URLSession wrote the file (must be moved before the delegate returns).
struct DownloadResult {
let location: URL
let response: URLResponse
}

private let lock = NSLock()
private var progressMap: [Int: FileProgress] = [:] // taskIdentifier -> progress
private var continuations: [Int: CheckedContinuation<DownloadResult, Error>] = [:]
private var _allProgress: [FileProgress] = []

/// Register a task so we can track its progress.
func register(taskID: Int, label: String, expectedBytes: Int64) {
lock.lock()
progressMap[taskID] = FileProgress(label: label, expectedBytes: expectedBytes)
rebuildSnapshot()
lock.unlock()
}

/// Store the continuation that will be resumed when the download finishes.
func setContinuation(_ cont: CheckedContinuation<DownloadResult, Error>, forTaskID taskID: Int) {
lock.lock()
continuations[taskID] = cont
lock.unlock()
}

/// Thread-safe snapshot of all tracked file progress, ordered by
/// registration time.
var allProgress: [FileProgress] {
lock.lock()
defer { lock.unlock() }
return _allProgress
}

/// Whether all registered downloads have completed (or errored).
var isComplete: Bool {
lock.lock()
defer { lock.unlock() }
return !progressMap.isEmpty && progressMap.values.allSatisfy(\.completed)
}

private func rebuildSnapshot() {
_allProgress = progressMap.keys.sorted().map { progressMap[$0]! }
}

// MARK: URLSessionDownloadDelegate

func urlSession(
_ session: URLSession,
downloadTask: URLSessionDownloadTask,
didWriteData bytesWritten: Int64,
totalBytesWritten: Int64,
totalBytesExpectedToWrite: Int64
) {
lock.lock()
let id = downloadTask.taskIdentifier
if var p = progressMap[id] {
p.downloadedBytes = totalBytesWritten
if totalBytesExpectedToWrite > 0 {
// Update expected if the server tells us (e.g. after resume
// partial content). Keep original manifest value if server
// returns -1.
}
progressMap[id] = p
rebuildSnapshot()
}
lock.unlock()
}

func urlSession(
_ session: URLSession,
downloadTask: URLSessionDownloadTask,
didFinishDownloadingTo location: URL
) {
// URLSession deletes the temp file when this callback returns.
// Move it to a stable location so the continuation consumer can
// process it.
let stableLocation = FileManager.default.temporaryDirectory
.appendingPathComponent("darkbloom-dl-\(downloadTask.taskIdentifier)-\(UUID().uuidString)")
try? FileManager.default.moveItem(at: location, to: stableLocation)

lock.lock()
let id = downloadTask.taskIdentifier
if var p = progressMap[id] {
p.downloadedBytes = p.expectedBytes > 0 ? p.expectedBytes : p.downloadedBytes
p.completed = true
p.completionTime = Date()
p.destinationURL = stableLocation
progressMap[id] = p
rebuildSnapshot()
}
let cont = continuations.removeValue(forKey: id)
lock.unlock()

let response = downloadTask.response ?? HTTPURLResponse(
url: downloadTask.originalRequest?.url ?? URL(string: "about:blank")!,
statusCode: 200,
httpVersion: nil,
headerFields: nil
)!
cont?.resume(returning: DownloadResult(location: stableLocation, response: response))
}

func urlSession(
_ session: URLSession,
task: URLSessionTask,
didCompleteWithError error: (any Error)?
) {
guard let error else { return }
lock.lock()
let id = task.taskIdentifier
if var p = progressMap[id] {
p.completed = true
p.completionTime = Date()
progressMap[id] = p
rebuildSnapshot()
}
let cont = continuations.removeValue(forKey: id)
lock.unlock()
cont?.resume(throwing: error)
}
}

/// Renders a multi-line progress display to the terminal using ANSI escape
/// codes. Falls back to simple per-file messages when stdout is not a TTY.
private final class ProgressRenderer: @unchecked Sendable {

private let isTTY: Bool
private var linesPrinted: Int = 0
private let lock = NSLock()
/// Set of labels already printed in non-TTY mode.
private var printedLabels: Set<String> = []

init() {
self.isTTY = isatty(STDOUT_FILENO) != 0
}

/// Render a frame given the current file progress snapshot.
func render(_ files: [FileProgress]) {
lock.lock()
defer { lock.unlock() }

if !isTTY {
renderPlain(files)
return
}
renderANSI(files)
}

/// Final render: clear the progress area and print completion summary.
func finish(_ files: [FileProgress]) {
lock.lock()
defer { lock.unlock() }

if isTTY {
// Move up and clear all lines.
if linesPrinted > 0 {
print("\u{1B}[\(linesPrinted)A", terminator: "")
for _ in 0..<linesPrinted {
print("\u{1B}[2K")
}
print("\u{1B}[\(linesPrinted)A", terminator: "")
linesPrinted = 0
}
}

// Print final summary lines.
for f in files {
let totalStr = Self.formatBytes(f.expectedBytes > 0 ? f.expectedBytes : f.downloadedBytes)
let elapsed = (f.completionTime ?? Date()).timeIntervalSince(f.startTime)
let avgSpeed = elapsed > 0.1 ? Double(f.downloadedBytes) / elapsed : 0
let speedStr = Self.formatSpeed(avgSpeed)
let timeStr = Self.formatDuration(elapsed)
print(" \u{2713} \(f.label) \(totalStr) \(speedStr) \(timeStr)")
}
}

// MARK: - ANSI rendering

private func renderANSI(_ files: [FileProgress]) {
// Move cursor up to overwrite previous render.
if linesPrinted > 0 {
print("\u{1B}[\(linesPrinted)A", terminator: "")
}

let termWidth = Self.terminalWidth()
var lines = 0
for f in files {
print("\u{1B}[2K", terminator: "") // Clear the line
let line = Self.formatLine(f, termWidth: termWidth)
print(line)
lines += 1
}
linesPrinted = lines
fflush(stdout)
}

private func renderPlain(_ files: [FileProgress]) {
for f in files where f.completed && !printedLabels.contains(f.label) {
printedLabels.insert(f.label)
let totalStr = Self.formatBytes(f.expectedBytes > 0 ? f.expectedBytes : f.downloadedBytes)
print(" \u{2713} \(f.label) \(totalStr)")
}
}

// MARK: - Line formatting

private static func formatLine(_ f: FileProgress, termWidth: Int) -> String {
if f.completed {
let totalStr = formatBytes(f.expectedBytes > 0 ? f.expectedBytes : f.downloadedBytes)
let elapsed = (f.completionTime ?? Date()).timeIntervalSince(f.startTime)
let avgSpeed = elapsed > 0.1 ? Double(f.downloadedBytes) / elapsed : 0
return " \u{2713} \(f.label) \(totalStr) \(formatSpeed(avgSpeed)) done"
}

let pct = Int(f.fraction * 100)
let dlStr = formatBytes(f.downloadedBytes)
let totStr = formatBytes(f.expectedBytes)
let speedStr = formatSpeed(f.speed)
let etaStr: String
if let eta = f.eta {
etaStr = "ETA \(formatDuration(eta))"
} else {
etaStr = "---"
}

// Assemble the suffix: " 62% 2.1/4.8 GB 113 MB/s ETA 24s"
let suffix = " \(String(format: "%3d", pct))% \(dlStr)/\(totStr) \(speedStr) \(etaStr)"

// Calculate bar width: total - label - prefix - suffix - brackets - spaces
let labelMaxWidth = min(f.label.count, 45)
let label = f.label.count > labelMaxWidth
? String(f.label.suffix(labelMaxWidth - 1)).padding(toLength: labelMaxWidth, withPad: " ", startingAt: 0)
: f.label
let prefix = " \(label) ["
let postfix = "]\(suffix)"
let barWidth = max(10, termWidth - prefix.count - postfix.count)

let filled = Int(f.fraction * Double(barWidth))
let empty = barWidth - filled
let bar = String(repeating: "\u{2588}", count: filled) + String(repeating: "\u{2591}", count: empty)

return "\(prefix)\(bar)\(postfix)"
}

// MARK: - Formatting helpers

static func formatBytes(_ bytes: Int64) -> String {
let b = Double(bytes)
if b < 1024 { return "\(bytes) B" }
if b < 1_048_576 { return String(format: "%.1f KB", b / 1024) }
if b < 1_073_741_824 { return String(format: "%.1f MB", b / 1_048_576) }
return String(format: "%.1f GB", b / 1_073_741_824)
}

static func formatSpeed(_ bytesPerSec: Double) -> String {
if bytesPerSec < 1024 { return String(format: "%.0f B/s", bytesPerSec) }
if bytesPerSec < 1_048_576 { return String(format: "%.0f KB/s", bytesPerSec / 1024) }
if bytesPerSec < 1_073_741_824 { return String(format: "%.0f MB/s", bytesPerSec / 1_048_576) }
return String(format: "%.1f GB/s", bytesPerSec / 1_073_741_824)
}

static func formatDuration(_ seconds: Double) -> String {
let s = Int(seconds)
if s < 60 { return "\(s)s" }
if s < 3600 { return "\(s / 60)m \(s % 60)s" }
return "\(s / 3600)h \(s / 60 % 60)m"
}

static func terminalWidth() -> Int {
#if canImport(Darwin)
var w = winsize()
if ioctl(STDOUT_FILENO, TIOCGWINSZ, &w) == 0, w.ws_col > 0 {
return Int(w.ws_col)
}
#endif
return 80
}
}
//
// Progress state (`FileProgress`, `ManifestDownloadProgress`) and terminal

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] 🧩 Comment references moved code but leaves no breadcrumb to new location

💡 Suggestion: Add specific file reference: 'Progress state lives in ModelDownloadProgress.swift'

📊 Score: 2×3 = 6 · Category: dead code

Comment on lines 493 to 500
/// from a `.part` file when present, verifying size + SHA-256 before
/// promoting to the final staged path. Reuses the resume-capable
/// `downloadFile` helper (Range requests, Content-Range validation, retries).
///
/// `onChunk(bytesOnDisk)` reports cumulative bytes-on-disk for this file as
/// it streams, so the foreground path can render a live per-shard bar; the
/// background prefetch passes nil and accounts progress per whole file.
private func downloadManifestFileWithResume(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] 🧩 Added onChunk parameter with default nil creates optional complexity

💡 Suggestion: Consider making onChunk required or using a separate method variant

📊 Score: 1×2 = 2 · Category: over-abstraction

Comment on lines +813 to +840
.appendingPathComponent("local", isDirectory: true)
}

/// Stable foreground-download staging dir name, keyed by the manifest's
/// `r2Prefix` so an interrupted download resumes into the SAME dir instead of
/// a throwaway UUID. `r2Prefix` is path-like (e.g. "v2/org__name/version");
/// flatten it to a single safe component.
static func localStagingDirName(r2Prefix: String) -> String {
".local-staging-" + r2Prefix
.replacingOccurrences(of: "/", with: "__")
.replacingOccurrences(of: "\\", with: "__")
}

/// Whether an interrupted foreground download left resumable content staged on
/// disk for this model build (keyed by `r2Prefix`): a completed shard or a
/// `.part` prefix in the stable `.local-staging-…` dir. Lets the picker show
/// "resuming" instead of "not downloaded" for a partially-downloaded model so
/// re-selecting it FINISHES the download rather than appearing to start over.
public static func hasResumableStaging(modelID: String, r2Prefix: String) -> Bool {
let stagingDir = cacheModelDirectory(for: modelID)
.appendingPathComponent("snapshots", isDirectory: true)
.appendingPathComponent(localStagingDirName(r2Prefix: r2Prefix), isDirectory: true)
guard let entries = try? FileManager.default.contentsOfDirectory(atPath: stagingDir.path) else {
return false
}
// Any non-hidden staged entry (a finished file, a `.part`, or a nested
// subdir like `adapters/`) is resumable content worth finishing.
return entries.contains { !$0.hasPrefix(".") }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] 🧩 localStagingDirName duplicates path sanitization logic from removed code

💡 Suggestion: Extract path sanitization to shared utility function

📊 Score: 2×3 = 6 · Category: duplicate logic

Comment on lines 578 to 588
// MARK: - Interactive Catalog Picker

/// Entry shown in the interactive TUI model picker.
private struct PickerEntry {
///
/// `downloaded` is computed from an UNFILTERED on-disk check (not the
/// available-memory-filtered scan) so a fully-downloaded model that exceeds
/// available RAM still reads "downloaded (won't fit)" rather than "not
/// downloaded". `resumable` flags a build whose foreground download was
/// interrupted (staging on disk) so the picker can show "resuming".
struct PickerEntry: Equatable {
let id: String

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] 🧩 PickerEntry struct adds resumable field that could be computed property

💡 Suggestion: Make resumable a computed property based on downloadedIDs and resumableIDs

📊 Score: 1×2 = 2 · Category: over-abstraction

Comment on lines +613 to +642
static func buildPickerEntries(
rows: [PickerCatalogRow],
downloadedIDs: Set<String>,
localMemoryByID: [String: Double],
resumableIDs: Set<String>,
memoryGb: Double
) -> [PickerEntry] {
var entries: [PickerEntry] = rows.compactMap { row in
let model = row.model
let isDownloaded = downloadedIDs.contains(model.id)
if !isDownloaded, let minRam = model.minRamGb, Double(minRam) > memoryGb {
return nil
}
let size = isDownloaded ? (localMemoryByID[model.id] ?? model.sizeGb) : model.sizeGb
return PickerEntry(
id: model.id,
catalogModel: model,
displayName: row.displayName,
sizeGb: size,
minRamGb: model.minRamGb,
downloaded: isDownloaded,
resumable: !isDownloaded && resumableIDs.contains(model.id)
)
}
// Downloaded first, then larger first.
entries.sort { a, b in
if a.downloaded != b.downloaded { return a.downloaded }
return a.sizeGb > b.sizeGb
}
return entries

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] 🧩 buildPickerEntries static method on Start command mixes UI logic with data transformation

💡 Suggestion: Move to dedicated PickerEntry or ModelCatalog utility class

📊 Score: 3×4 = 12 · Category: misplaced responsibility

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

No directly mapped threat-model coverage for this PR — but two files warrant threat-model expansion and one warrants a targeted security note.


Trust boundaries touched

None of the four changed files match an affected_files pattern in the current threat model. The closest boundaries are TB-003 (provider operator vs. process) and TB-007 (provider inference engine), but neither lists these paths explicitly.


Files outside current coverage

File Observation
libs/mlx-swift-lm Inference-engine library change. Currently only provider-swift/Sources/ProviderCore/Inference/** is listed under T-007/T-027/T-028. Changes here directly affect the GPU execution path, KV-cache lifecycle, and (for T-028) residual-data exposure.
provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift New trust-boundary surface — see note below.
provider-swift/Tests/ProviderCoreTests/BatchKVCacheTests.swift Test-only; no production surface.
provider-swift/Tests/ProviderCoreTests/LaunchAgentRestartTests.swift Test-only; no production surface, but see LaunchAgent note.

New attack surface not covered by an existing threat

LaunchAgent.swift — persistent process re-launch mechanism

A launchd LaunchAgent plist introduces a surface not modelled anywhere in the current threat model:

  1. TB-003 gap — operator persistence / restart-after-kill: LaunchAgents are user-writable (~/Library/LaunchAgents/). If the provider process is killed (e.g. by the coordinator detecting a bad challenge), a LaunchAgent restarts it automatically. This potentially undermines the "three consecutive challenge failures → permanently untrusted" eviction model: the process restarts, re-registers with a fresh ephemeral X25519 key, and begins a new attestation cycle. The threat model assumes process termination is a meaningful enforcement action (TB-003, T-015); a self-restarting agent weakens that assumption. Recommend adding this as a new finding under T-015 or a new T-04x entry.

  2. Plist path and contents should be reviewed: If the plist is written to disk by the process itself (common pattern), the operator can modify it to inject environment variables or arguments before the restart — potentially bypassing startup checks. Verify the plist is either embedded in the signed bundle or written with restricted permissions.

  3. SIP / Hardened Runtime interaction: LaunchAgents run under the user session, not as root. PT_DENY_ATTACH and Hardened Runtime still apply on restart, so the core anti-debug posture (T-014) is unaffected. This is ℹ️ neutral for T-014.


Recommended threat-model updates

  • Add provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift to affected_files for T-015 and T-035.
  • Add libs/mlx-swift-lm/** to affected_files for T-007, T-027, T-028, and T-041.
  • Consider a new threat entry covering LaunchAgent-mediated re-registration after forced disconnection under TB-003, specifically whether automatic restart resets the "permanently untrusted" coordinator state.

🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cd519081b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +623 to +624
if !isDownloaded, let minRam = model.minRamGb, Double(minRam) > memoryGb {
return nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Block non-TTY picks of oversized local models

When a downloaded model exceeds this machine's RAM, this branch now keeps it in entries, but the non-TTY fallbackPicker still accepts all or the row number without the canFitIndividually guard used by the TTY picker. In the piped-input path that oversized ID can be written into the launch agent, and the foreground child later builds from the memory-filtered snapshot.models, so an oversized-only selection starts the service only to fail with No models selected; either filter/disable those rows in fallback or keep them display-only there.

Useful? React with 👍 / 👎.

try await self.downloadManifestFileWithProgress(
job, tracker: tracker, session: delegateSession
)
try await self.downloadManifestFileWithResume(job, onChunk: { bytes in

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid byte-wise foreground shard downloads

For manifest models selected through darkbloom models download or the interactive picker, this new foreground path now routes every multi-GB shard through downloadManifestFileWithResume, whose streaming helper consumes URLSession.AsyncBytes one UInt8 at a time (for try await byte in byteStream). That replaces the previous URLSessionDownloadTask foreground path with per-byte Swift async iteration, making large model downloads CPU-bound and much slower; use a chunked/delegate-based resume path instead of iterating each byte individually.

Useful? React with 👍 / 👎.

try await self.downloadManifestFileWithProgress(
job, tracker: tracker, session: delegateSession
)
try await self.downloadManifestFileWithResume(job, onChunk: { bytes in

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Promote complete .part files before resuming

If a foreground download is killed after a shard is fully written to <dest>.part but before SHA verification/move (for example during hashing a multi-GB shard), restarting now enters this new resume path and sends Range: bytes=<full-size>-; a normal CDN responds 416, and streamDownload deletes the complete .part before retrying from zero. This defeats finish-on-restart and can fail on tight disks because the capacity check credited the full .part; check for a full-size/hash-valid .part and promote it before issuing a Range request.

Useful? React with 👍 / 👎.

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — 1 finding(s)

  • 🔵 [INFO] provider-swift/Tests/ProviderCoreTests/BatchKVCacheTests.swift:329-350 — Test creates unnecessary MLXArray zeros when simple shapes would suffice
    • Suggestion: Use MLXArray.zeros([2, 1, 4, 1]) pattern is fine, but consider if the test logic could be simplified to focus on the core assertion about negative padding without the full cache update cycle

1 finding(s) total, 0 blocking. Verdict: COMMENT.

🤖 Automated review by Centaur · DAR-186

Comment on lines +329 to +350
@Test("rotating decode stays unmasked after trimming past the window (negative padding)")
func rotatingUnmaskedAfterWindowTrim() {
// A rotating cache that generates past its window trims slots and
// subtracts the trim from leftPadding, so unpadded rows go NEGATIVE.
// The decode-mask fast path must treat negative padding as unpadded
// (<= 0), or every long (> window) Gemma 4 decode falls back to the
// explicit-mask path on each sliding layer (the mlx#3384 slow/divergent
// path the repetition fix avoids).
let cache = BatchRotatingKVCache(maxSize: 3, leftPadding: [0, 0])
_ = cache.update( // prefill 4 > window 3 -> trims, leftPadding goes negative
keys: MLXArray.zeros([2, 1, 4, 1]), values: MLXArray.zeros([2, 1, 4, 1]))
_ = cache.update( // one decode step
keys: MLXArray.zeros([2, 1, 1, 1]), values: MLXArray.zeros([2, 1, 1, 1]))
// Precondition that makes this a real regression test: rows are now
// negatively padded, so a `== 0` guard would (wrongly) mask.
#expect(cache.leftPadding.max().item(Int32.self) < 0)
let mask = cache.makeMask(n: 1, windowSize: 3, returnArray: true)
guard case .none = mask else {
Issue.record("expected .none for unpadded rotating decode past window, got \(mask)")
return
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] 🧩 Test creates unnecessary MLXArray zeros when simple shapes would suffice

💡 Suggestion: Use MLXArray.zeros([2, 1, 4, 1]) pattern is fine, but consider if the test logic could be simplified to focus on the core assertion about negative padding without the full cache update cycle

📊 Score: 2×3 = 6 · Category: over-abstraction

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

https://github.com/Layr-Labs/d-inference/blob/6953e407485553f3f56a96054c84a6a07e46af78/libs/mlx-swift-lm/Libraries/MLXLMCommon/ContinuousBatching/EngineCore.swift#L112-L117
P2 Badge Persist the resource-debug opt-out for launchd

When the provider is started through the normal background path (darkbloom start), this check runs in launchd's start --foreground child, but LaunchAgent.makeServicePlist only persists DARKBLOOM_PREFIX_CACHE into EnvironmentVariables. As a result, DARKBLOOM_MLX_RESOURCE_DEBUG=0 darkbloom start is dropped before the child launches, so the new default-on stdout/os_log resource trace cannot be disabled for production daemons unless the plist is hand-edited or this key is added to the launchd passthrough allowlist.


https://github.com/Layr-Labs/d-inference/blob/6953e407485553f3f56a96054c84a6a07e46af78/libs/mlx-swift-lm/Libraries/MLXLMCommon/ContinuousBatching/EngineCore.swift#L371-L372
P2 Badge Avoid unbounded stdout resource traces

With this default-on path, every 50 busy scheduler steps appends an [rsrc] line to stdout before the coarser os_log gating runs. In the normal launchd service, stdout/stderr are redirected to ~/.darkbloom/provider.log via StandardOutPath/StandardErrorPath, and that file has no rotation in the repo, so high-throughput providers will grow a local log indefinitely even though the unified-log report path is capped; make the stdout trace coarse/opt-in or rotate/truncate the legacy log.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…k fix

Bumps mlx-swift-lm to 1e7b5ac (Layr-Labs/mlx-swift-lm#42): (1) default-on [rsrc] resource-count telemetry via os_log so reports carry the trajectory; (2) decode-mask fast path now treats negative leftPadding as unpadded (<= 0), fixing a regression where long (> sliding-window) Gemma 4 generations fell back to the explicit-mask path on every sliding layer. Adds a provider regression test exercising the negative-padding rotating-cache case. Target 0.6.12.

Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com>

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — 1 finding(s)

  • 🔵 [INFO] provider-swift/Tests/ProviderCoreTests/BatchKVCacheTests.swift:329-350 — Test method has verbose name and complex setup for a simple negative padding check
    • Suggestion: Simplify test name to 'testNegativePaddingUnmasked' and reduce setup complexity - the core assertion is just checking leftPadding < 0 and mask == .none

1 finding(s) total, 0 blocking. Verdict: COMMENT.

🤖 Automated review by Centaur · DAR-186

Comment thread provider-swift/Tests/ProviderCoreTests/BatchKVCacheTests.swift

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

@Gajesh2007

Copy link
Copy Markdown
Member Author

Addressed both Codex P2s on the default-on telemetry (a405ac8f + submodule bump to 8518e6d):

  • "Wire the telemetry opt-out into the LaunchAgent"DARKBLOOM_MLX_RESOURCE_DEBUG is now forwarded via LaunchAgent.passthroughEnvKeys, so the documented opt-out actually reaches the launchd service. Added a passthrough unit test.
  • "Gate stdout resource samples in daemon mode" — bumped libs/mlx-swift-lm to 8518e6d, which TTY-gates the [rsrc] stdout print (no more unbounded provider.log growth). os_log unchanged.

@blacksmith-sh

blacksmith-sh Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Found 1 test failure on Blacksmith runners:

Failure

Test View Logs
github.com/eigeninference/d-inference/e2e/TestProfile_SingleProviderNonStreaming View Logs

Fix with Codesmith
Need help on this PR? Tag /codesmith with what you need.

Bump libs/mlx-swift-lm to 288bcba (rebased onto main; drops the duplicate #41
commits and adds the post-update window-bound fix for the decode mask fast path).
Add a regression test: when windowSize < maxCacheSize, makeMask must keep the
windowed mask at the boundary (.none would let the new token attend one position
past the window), while staying .none below the boundary. The existing Gemma
(window == maxCacheSize) negative-padding test still passes unchanged.

Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com>

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

#42 (default-on resource telemetry + Gemma 4 decode-mask fixes) is merged to
mlx-swift-lm main. Move the submodule pointer from the feature-branch commit to
the squashed main commit e7af9df (identical content). The parent-side
BatchKVCache regression tests added on this branch stay.

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

@Gajesh2007 Gajesh2007 merged commit c20a8eb into master Jun 16, 2026
9 of 14 checks passed

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

https://github.com/Layr-Labs/d-inference/blob/4cc4973c940c0f79a9626b24d960019f19cdc0b1/libs/mlx-swift-lm/Libraries/MLXLMCommon/BatchKVCache.swift#L408-L410
P2 Badge Preserve window masks in BatchKVCache fast path

When windowSize is non-nil and smaller than the accumulated BatchKVCache length, this new single-token fast path drops the mask entirely for unpadded rows. A caller that asks for a sliding window after the cache already holds more than that window will now let the query attend to every retained key instead of only the active window; the rotating cache added a post-update window guard, but this regular batch cache needs the same check or should keep returning an array mask when windowSize is active.


https://github.com/Layr-Labs/d-inference/blob/4cc4973c940c0f79a9626b24d960019f19cdc0b1/libs/mlx-swift-lm/Libraries/MLXLMCommon/ContinuousBatching/EngineCore.swift#L404-L405
P2 Badge Throttle high-pressure os_log samples

When a provider stays above 70% resource pressure without immediately crashing, this branch logs [rsrc] to unified logging every 10 busy scheduler steps. Those are exactly the logs collected by darkbloom report, whose manual path rejects reports over 10 MB and whose auto-report path truncates to 10 MB, so a sustained high-pressure run can make the diagnostic report fail or lose the newest crash ramp; keep the dense sampling bounded to a short transition window or sample it at the report cadence after the first few high-pressure points.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Gajesh2007 added a commit that referenced this pull request Jun 16, 2026
Memory & reliability hardening:
- 90% unified-memory cap + KV purge on unload + serve-while-load reservation (#363)
- bounded checkpoint-capture pipeline, stops the Metal 499000 resource leak (#374)
- resumable 4-bit model downloads + restart-safe detection (#372)
- default-on [rsrc] resource telemetry in reports (#373 / mlx-swift-lm#42)
- don't log raw request-parse errors — prompt-fragment privacy (#376)

Coordinator LatestProviderVersion bump is intentionally deferred until the signed
0.6.12 bundle is published (prod uses the in-memory store, so the hardcoded
fallback is authoritative after a redeploy).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants