You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .agents/workflows/mlx-upstream-sync.md
+44-10Lines changed: 44 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,29 +21,63 @@ mlx-server (SharpAI/SwiftLM)
21
21
22
22
**Never bundle C++ source files directly into `mlx-swift`.** All Apple core Engine updates and C-wrapper modifications MUST be executed in the `SharpAI/mlx` and `SharpAI/mlx-c` forks respectively.
When Apple releases new features to `ml-explore/mlx` or `ml-explore/mlx-c` that you want to integrate into the inference engine:
26
+
When Apple releases new features to `ml-explore/mlx` or `ml-explore/mlx-c`, follow this systematic process to verify, integrate, and validate the changes before bringing them into the SharpAI ecosystem.
27
+
28
+
### 2.1 Double-Checking Upstream Features
29
+
30
+
Before syncing, verify if Apple's upstream actually fulfills all your custom requirements (which informs whether you should safely drop your custom patches):
31
+
32
+
1.**Review Upstream Logging/Releases:** Actively monitor the [Apple MLX Releases page](https://github.com/ml-explore/mlx/releases) or the `main` commit history for mentions of "quantization", "streaming", "memory-mapped operations", or "out-of-core inference".
33
+
2.**Examine Target C++ Kernels:**
34
+
- Look primarily in `mlx/backend/metal/` and `mlx/core/`.
35
+
- Has upstream Apple added an equivalent to `moe_stream_op.cpp` natively?
36
+
- Do the Metal shaders in `mlx/backend/metal/kernels/` natively introduce block execution / memory-mapped loading primitives similar to our `ssd_streamer.mm` and `fence.air` logic?
37
+
3.**Check Exported C-APIs:** Look at `mlx/c/ops.h` and `mlx/c/fast.h` in `ml-explore/mlx-c`. If Apple has added official C-bindings for out-of-core tensor operations, you can securely begin stripping out the custom SharpAI C++ bridging codebase.
38
+
39
+
### 2.2 Integration Flow
40
+
41
+
If Apple's features are highly beneficial (e.g., core Metal optimizations) but do not explicitly replace our SSD streaming, we need to pull their features *while maintaining* the SharpAI SSD kernels.
# Rebase or merge Apple's latest main into SharpAI's main
49
+
# Rebase Apple's latest main directly under our custom SSD commits
36
50
git rebase upstream/main
37
-
git push origin main
51
+
# Resolve any merge conflicts specifically around `fast.cpp` or Make/CMake builds
52
+
git push -f origin main
38
53
```
39
-
2.Repeat the exact same process for `SharpAI/mlx-c`.
40
-
3. In `SharpAI/mlx-swift`, update the submodule pointers:
54
+
2.Execute the identical rebasing process for `SharpAI/mlx-c`, monitoring `mlx_c/ops.cpp`.
55
+
3. In `SharpAI/mlx-swift`, update the submodule pointers to mount your freshly rebased commits:
41
56
```bash
57
+
cd LocalPackages/mlx-swift
42
58
git submodule update --remote --recursive
43
-
git commit -am "chore: sync latest Apple MLX components"
59
+
git commit -am "chore: sync latest Apple MLX components and re-graft SSD patches"
44
60
git push origin main
45
61
```
46
-
4. Finally, your local `mlx-server` will automatically pull the updated `mlx-swift` package upon running `./build.sh` (or `swift package resolve`).
62
+
63
+
### 2.3 Validation Flow
64
+
65
+
Do not deploy binary updates to the inference engine without executing the extreme validation matrix.
66
+
67
+
1.**Clean Re-Build:** Always execute a destructive cache wipe before a Metal compilation test.
68
+
```bash
69
+
# In mlx-server framework
70
+
rm -rf .build
71
+
./build.sh
72
+
```
73
+
2.**Swift API Layer Verification:** Run the test suites within your wrapper to certify that the Swift `->` C `->` C++ bindings remain structurally unified.
74
+
```bash
75
+
cd LocalPackages/mlx-swift
76
+
swift test
77
+
```
78
+
3.**Extreme Context Benchmarking (The Harness):**
79
+
- Run the dedicated `/run-benchmark` workflow from the root `mlx-server` directory (utilizing `run_benchmark.sh` or `profile_runner.py`).
80
+
- Specifically target models invoking >32k token contexts. High prompt generation latency, GPU thrashing, or hard Out-of-Memory (OOM) faults directly indicate that the Metal barrier (`fence.air`) or `ssd_streamer.mm` broke silently during the git rebase.
0 commit comments