Skip to content

Commit 336c8a8

Browse files
authored
Merge pull request #76 from SharpAI/fix/issue-72-draft-model-ssd-ram
fix(ssd-stream): prevent RAM explosion when --draft-model + --stream-experts combined (#72)
2 parents 743b1a1 + 9b0a31c commit 336c8a8

2 files changed

Lines changed: 277 additions & 2 deletions

File tree

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
---
2+
description: Review a GitHub Issue or PR for SharpAI/SwiftLM — fetch, analyze, implement fixes, address review comments, and push back to the correct branch
3+
---
4+
5+
# Review GitHub Issue / PR
6+
7+
This workflow guides end-to-end handling of a GitHub Issue or Pull Request for the
8+
`SharpAI/SwiftLM` repository: from fetching context, through implementing or
9+
reviewing code changes, to pushing a clean commit back to the correct fork branch.
10+
11+
---
12+
13+
## Prerequisites
14+
15+
- `gh` CLI path on macOS: **`/opt/homebrew/bin/gh`**
16+
```bash
17+
export PATH="/opt/homebrew/bin:$PATH"
18+
which gh # → /opt/homebrew/bin/gh
19+
```
20+
- `gh` must be authenticated (`gh auth status`)
21+
- Working directory: `/Users/simba/workspace/mlx-server`
22+
- Remote `fork` may need to be added if pushing to a contributor's fork:
23+
```bash
24+
git remote add fork https://github.com/<contributor>/SwiftLM.git
25+
```
26+
27+
---
28+
29+
## Steps
30+
31+
### 1. Fetch the Issue or PR
32+
33+
Determine whether the user supplied an **Issue number** or a **PR number**, then
34+
pull the full context using `gh`:
35+
36+
```bash
37+
# For a PR
38+
gh pr view <NUMBER> --repo SharpAI/SwiftLM \
39+
--json number,title,body,state,baseRefName,headRefName,headRepository,commits,files
40+
41+
# For an Issue
42+
gh issue view <NUMBER> --repo SharpAI/SwiftLM \
43+
--json number,title,body,state,labels,comments
44+
```
45+
46+
Note the **`headRepository`** field — if it is not `SharpAI/SwiftLM`, the PR comes
47+
from a fork. You must push back to the fork's branch (see Step 6).
48+
49+
---
50+
51+
### 2. Understand the Scope
52+
53+
Read the PR/Issue body and associated comments carefully. Identify:
54+
55+
- **Category** — bug fix, feature, test improvement, CI/CD, documentation.
56+
- **Files touched** — run `gh pr diff <NUMBER> --repo SharpAI/SwiftLM` or read
57+
the `files` field.
58+
- **CI status** — check the latest run:
59+
```bash
60+
gh run list --repo SharpAI/SwiftLM --branch <headRefName> --limit 3
61+
```
62+
- **Review comments** — if Copilot or a human left inline review comments, read
63+
them all before writing a single line of code:
64+
```bash
65+
gh pr view <NUMBER> --repo SharpAI/SwiftLM --comments
66+
```
67+
68+
---
69+
70+
### 3. Check Out the Branch Locally
71+
72+
```bash
73+
# If the PR is from SharpAI directly
74+
git fetch origin
75+
git checkout <headRefName>
76+
77+
# If the PR is from a fork
78+
git remote add fork https://github.com/<forkOwner>/SwiftLM.git # once only
79+
git fetch fork <headRefName>
80+
git checkout -b <headRefName> fork/<headRefName>
81+
```
82+
83+
Verify you are on the correct branch:
84+
```bash
85+
git status
86+
git log --oneline -5
87+
```
88+
89+
---
90+
91+
### 4. Triage Review Comments (for PRs)
92+
93+
For each Copilot or human review comment:
94+
95+
1. **Classify** the severity:
96+
- 🔴 **Must fix** — correctness bugs, resource leaks, race conditions, broken CI.
97+
- 🟡 **Should fix** — test coverage gaps, false-pass logic, missing imports.
98+
- 🟢 **Optional** — style, wording, architecture refactors beyond the PR scope.
99+
100+
2. **Implement** all 🔴 and 🟡 items. For 🟢 items, document them as follow-up
101+
work in a code comment or GitHub comment but do not expand the PR scope.
102+
103+
3. **Key patterns learned from SwiftLM history**:
104+
- Shell scripts use `set -euo pipefail` — every `grep`, `jq`, or pipeline that
105+
may produce no output **must** be guarded with `|| true` or placed inside an
106+
`if` condition to prevent silent script abort.
107+
- Heartbeat / background `Task` objects in Swift **must** be cancelled via
108+
`defer { task?.cancel() }` so all exit paths (including client disconnect)
109+
are covered — not just the happy path.
110+
- CORS-related shell tests must target the dedicated `--cors` server instance,
111+
not the main server started without the flag.
112+
- Concurrent-request tests must use `--parallel N` (N ≥ 2) to actually exercise
113+
parallel code paths.
114+
- When adding new Swift test files that use `Data` / `JSONSerialization`,
115+
always add `import Foundation` — XCTest does not re-export it in all SPM environments.
116+
117+
---
118+
119+
### 5. Verify Locally
120+
121+
Build and run the relevant test suite before pushing:
122+
123+
```bash
124+
# Swift unit tests
125+
swift test --filter SwiftLMTests
126+
127+
# Integration tests (server)
128+
./tests/test-server.sh .build/release/SwiftLM 15413
129+
130+
# OpenCode / SDK compatibility test
131+
./tests/test-opencode.sh .build/release/SwiftLM 15414
132+
```
133+
134+
If CI previously failed with a specific test number, reproduce it locally first:
135+
```bash
136+
gh run view <RUN_ID> --repo SharpAI/SwiftLM --log-failed 2>&1 | grep -E "FAIL|error|Test [0-9]+"
137+
```
138+
139+
---
140+
141+
### 6. Commit and Push to the Correct Remote
142+
143+
> [!IMPORTANT]
144+
> Always push to the **fork's branch** when updating a fork-originated PR.
145+
> Pushing to `origin` (SharpAI) creates a new branch and does NOT update the PR.
146+
147+
```bash
148+
git add <files>
149+
git commit -m "<type>(<scope>): <summary>
150+
151+
<body: what changed and why>"
152+
153+
# PR from a fork → push to fork
154+
git push fork <headRefName>:<headRefName>
155+
156+
# PR from SharpAI directly → push to origin
157+
git push origin <headRefName>
158+
```
159+
160+
Verify the PR was updated:
161+
```bash
162+
gh pr view <NUMBER> --repo SharpAI/SwiftLM --json commits --jq '.commits[].messageHeadline'
163+
```
164+
165+
---
166+
167+
### 7. Monitor CI
168+
169+
After pushing, monitor the triggered workflow:
170+
171+
```bash
172+
# List recent runs on the branch
173+
gh run list --repo SharpAI/SwiftLM --branch <headRefName> --limit 5
174+
175+
# Stream logs for the latest run
176+
gh run view <RUN_ID> --repo SharpAI/SwiftLM --log
177+
178+
# Pull only failed steps
179+
gh run view <RUN_ID> --repo SharpAI/SwiftLM --log-failed 2>&1 | grep -E "FAIL|error|exit code"
180+
```
181+
182+
If tests fail, go back to Step 4. Iterate until CI is green.
183+
184+
---
185+
186+
### 8. Respond to Reviewers (Optional)
187+
188+
If a human or Copilot reviewer left inline comments that you have addressed,
189+
leave a reply comment summarising what was changed and why each item was handled
190+
(or deferred):
191+
192+
```bash
193+
gh pr comment <NUMBER> --repo SharpAI/SwiftLM \
194+
--body "Addressed all 🔴/🟡 review comments in commit <SHA>:
195+
- heartbeat leak: added defer cleanup in both streaming handlers
196+
- import Foundation: added to ServerSSETests.swift
197+
- CORS test: redirected to CORS_PORT server
198+
- parallel test: dedicated --parallel 2 server on PORT+3
199+
- set -e trap: guarded grep/jq pipelines with || true"
200+
```
201+
202+
---
203+
204+
## Quick Reference
205+
206+
| Task | Command |
207+
|------|---------|
208+
| View PR | `gh pr view <N> --repo SharpAI/SwiftLM` |
209+
| View PR diff | `gh pr diff <N> --repo SharpAI/SwiftLM` |
210+
| View PR comments | `gh pr view <N> --repo SharpAI/SwiftLM --comments` |
211+
| View Issue | `gh issue view <N> --repo SharpAI/SwiftLM` |
212+
| List CI runs | `gh run list --repo SharpAI/SwiftLM --branch <branch>` |
213+
| Failed CI logs | `gh run view <ID> --repo SharpAI/SwiftLM --log-failed` |
214+
| Push to fork | `git push fork <branch>:<branch>` |
215+
| Push to SharpAI | `git push origin <branch>` |
216+
| Verify PR commits | `gh pr view <N> --repo SharpAI/SwiftLM --json commits --jq '.commits[].messageHeadline'` |

Sources/SwiftLM/Server.swift

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -301,6 +301,22 @@ struct MLXServer: AsyncParsableCommand {
301301
// Resolve model directory for profiling (checks HuggingFace cache)
302302
let modelDirectory = resolveModelDirectory(modelId: modelId)
303303

304+
// ── Fix #72: Compute draft model footprint ONCE (Copilot review) ──────
305+
// Resolved before the streamExperts block so the exact byte count can be
306+
// reused for the early cap, both strategy branches, and logging without
307+
// repeating the filesystem walk. Use weightFileSizeBytes (exact bytes)
308+
// instead of weightMemoryGB * 1_073_741_824 to avoid the ~7% GiB/GB
309+
// mismatch flagged in Copilot review (weightMemoryGB = bytes / 1e9, not /2^30).
310+
let draftFootprintBytes: Int
311+
if self.streamExperts,
312+
let draftPath = self.draftModel,
313+
let draftDir = resolveModelDirectory(modelId: draftPath),
314+
let draftProfile = ModelProfiler.profile(modelDirectory: draftDir, modelId: draftPath) {
315+
draftFootprintBytes = draftProfile.weightFileSizeBytes
316+
} else {
317+
draftFootprintBytes = 0
318+
}
319+
304320
if self.streamExperts, let modelDir = modelDirectory {
305321
setenv("EXPERIMENTAL_SSD_STREAM", modelDir.path, 1)
306322
// Activate the modern Swift ExpertStreamingConfig so Load.swift can:
@@ -314,6 +330,24 @@ struct MLXServer: AsyncParsableCommand {
314330
// Cap Metal command buffer size to avoid the 5s Apple GPU Watchdog.
315331
setenv("MLX_MAX_OPS_PER_BUFFER", "50", 1)
316332
print("[SwiftLM] Enabled Async SSD Streaming on directory: \(modelDir.lastPathComponent)")
333+
334+
// ── Fix #72: Apply SSD memory cap EARLY (before any model loads) ──
335+
// Both the main model and draft model must load under the budget.
336+
// The sentinel memoryLimit bypasses MLX eval_impl's spin-wait loop.
337+
// Also address Copilot comment: apply the cap even when modelDirectory
338+
// is nil (first-run download) so downloads also respect the budget.
339+
let system = ModelProfiler.systemProfile()
340+
if draftFootprintBytes > 0 {
341+
print("[SwiftLM] 📦 Draft model footprint: \(String(format: "%.2f", Double(draftFootprintBytes) / 1e9))GB reserved from SSD budget")
342+
}
343+
Memory.cacheLimit = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes)
344+
Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200 GB sentinel
345+
} else if self.streamExperts {
346+
// modelDirectory is nil — model not yet downloaded (first-run).
347+
// Still apply the SSD memory cap so the download itself is bounded.
348+
let system = ModelProfiler.systemProfile()
349+
Memory.cacheLimit = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes)
350+
Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200 GB sentinel
317351
}
318352

319353
var partitionPlan: PartitionPlan?
@@ -338,7 +372,8 @@ struct MLXServer: AsyncParsableCommand {
338372
if self.streamExperts {
339373
// SSD Streaming: expert weights are mmap'd from SSD via the OS page cache.
340374
// No swap involved — the page cache evicts stale expert pages cleanly.
341-
let physicalBudget = Int(Double(system.totalRAMBytes) * 0.85) - (4 * 1024 * 1024 * 1024)
375+
// draftFootprintBytes pre-computed once above (Copilot review).
376+
let physicalBudget = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes)
342377
Memory.cacheLimit = physicalBudget
343378
Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200GB sentinel to bypass MLX eval_impl spin loop
344379
print("[SwiftLM] 💾 Memory strategy: SSD STREAMING (page-cache managed, \(physicalBudget / (1024*1024*1024))GB RAM budget, no swap)")
@@ -349,7 +384,8 @@ struct MLXServer: AsyncParsableCommand {
349384
}
350385
case .layerPartitioned:
351386
if self.streamExperts {
352-
let physicalBudget = Int(Double(system.totalRAMBytes) * 0.85) - (4 * 1024 * 1024 * 1024)
387+
// draftFootprintBytes pre-computed once above (Copilot review).
388+
let physicalBudget = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes)
353389
Memory.cacheLimit = physicalBudget
354390
Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200GB sentinel to bypass MLX eval_impl spin loop
355391
print("[SwiftLM] 💾 Memory strategy: SSD STREAMING (page-cache managed, \(physicalBudget / (1024*1024*1024))GB RAM budget, no swap)")
@@ -476,6 +512,11 @@ struct MLXServer: AsyncParsableCommand {
476512
} else {
477513
draftConfig = ModelConfiguration(id: draftModelPath)
478514
}
515+
// Fix #72: mirror lazyLoad so the draft model's weights are mmap'd
516+
// (not eagerly paged into unified RAM) when SSD streaming is active.
517+
if self.streamExperts {
518+
draftConfig.lazyLoad = true
519+
}
479520
let draftDownloader = HubDownloader(hub: HubApi(downloadBase: cacheRoot))
480521
let draftContainer = try await LLMModelFactory.shared.loadContainer(
481522
from: draftDownloader,
@@ -833,6 +874,24 @@ struct ServerConfig: Sendable {
833874
let turboKV: Bool
834875
}
835876

877+
// ── SSD Memory Budget ────────────────────────────────────────────────────────
878+
879+
/// Compute the page-cache budget (bytes) for SSD streaming mode.
880+
///
881+
/// Formula: `totalRAM × 0.85 − osHeadroom − draftWeightBytes`, floored at 2 GB.
882+
///
883+
/// - Parameters:
884+
/// - totalRAMBytes: Physical RAM reported by the OS (e.g. `system.totalRAMBytes`).
885+
/// - draftWeightBytes: Weight size (bytes) of the draft model, or 0 if none.
886+
/// Subtracted so the draft model's resident pages don't push the main model's
887+
/// page cache over the physical limit and trigger swap (Issue #72).
888+
/// - Returns: The recommended `Memory.cacheLimit` value in bytes.
889+
func computeSSDMemoryBudget(totalRAMBytes: UInt64, draftWeightBytes: Int = 0) -> Int {
890+
let osHeadroom = 4 * 1024 * 1024 * 1024 // 4 GB for OS + system processes
891+
let raw = Int(Double(totalRAMBytes) * 0.85) - osHeadroom - draftWeightBytes
892+
return max(raw, 2 * 1024 * 1024 * 1024) // floor at 2 GB
893+
}
894+
836895
// ── Model Directory Resolution ───────────────────────────────────────────────
837896

838897
/// Resolve a model ID to its local directory (if already downloaded).

0 commit comments

Comments
 (0)