Skip to content

Commit 1395456

Browse files
committed
docs: expand upstream sync flow to include verification, integration, and validation steps
1 parent 0fdfea7 commit 1395456

1 file changed

Lines changed: 44 additions & 10 deletions

File tree

.agents/workflows/mlx-upstream-sync.md

Lines changed: 44 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,29 +21,63 @@ mlx-server (SharpAI/SwiftLM)
2121

2222
**Never bundle C++ source files directly into `mlx-swift`.** All Apple core Engine updates and C-wrapper modifications MUST be executed in the `SharpAI/mlx` and `SharpAI/mlx-c` forks respectively.
2323

24-
## 2. Synchronizing New Apple Upstream Features
24+
## 2. Upstream Feature Verification & Integration Flow
2525

26-
When Apple releases new features to `ml-explore/mlx` or `ml-explore/mlx-c` that you want to integrate into the inference engine:
26+
When Apple releases new features to `ml-explore/mlx` or `ml-explore/mlx-c`, follow this systematic process to verify, integrate, and validate the changes before bringing them into the SharpAI ecosystem.
27+
28+
### 2.1 Double-Checking Upstream Features
29+
30+
Before syncing, verify if Apple's upstream actually fulfills all your custom requirements (which informs whether you should safely drop your custom patches):
31+
32+
1. **Review Upstream Logging/Releases:** Actively monitor the [Apple MLX Releases page](https://github.com/ml-explore/mlx/releases) or the `main` commit history for mentions of "quantization", "streaming", "memory-mapped operations", or "out-of-core inference".
33+
2. **Examine Target C++ Kernels:**
34+
- Look primarily in `mlx/backend/metal/` and `mlx/core/`.
35+
- Has upstream Apple added an equivalent to `moe_stream_op.cpp` natively?
36+
- Do the Metal shaders in `mlx/backend/metal/kernels/` natively introduce block execution / memory-mapped loading primitives similar to our `ssd_streamer.mm` and `fence.air` logic?
37+
3. **Check Exported C-APIs:** Look at `mlx/c/ops.h` and `mlx/c/fast.h` in `ml-explore/mlx-c`. If Apple has added official C-bindings for out-of-core tensor operations, you can securely begin stripping out the custom SharpAI C++ bridging codebase.
38+
39+
### 2.2 Integration Flow
40+
41+
If Apple's features are highly beneficial (e.g., core Metal optimizations) but do not explicitly replace our SSD streaming, we need to pull their features *while maintaining* the SharpAI SSD kernels.
2742

2843
1. **Pull Upstream to SharpAI forks**:
2944
```bash
30-
git clone https://github.com/SharpAI/mlx
31-
cd mlx
45+
git clone https://github.com/SharpAI/mlx && cd mlx
3246
git remote add upstream https://github.com/ml-explore/mlx
3347
git fetch upstream
3448

35-
# Rebase or merge Apple's latest main into SharpAI's main
49+
# Rebase Apple's latest main directly under our custom SSD commits
3650
git rebase upstream/main
37-
git push origin main
51+
# Resolve any merge conflicts specifically around `fast.cpp` or Make/CMake builds
52+
git push -f origin main
3853
```
39-
2. Repeat the exact same process for `SharpAI/mlx-c`.
40-
3. In `SharpAI/mlx-swift`, update the submodule pointers:
54+
2. Execute the identical rebasing process for `SharpAI/mlx-c`, monitoring `mlx_c/ops.cpp`.
55+
3. In `SharpAI/mlx-swift`, update the submodule pointers to mount your freshly rebased commits:
4156
```bash
57+
cd LocalPackages/mlx-swift
4258
git submodule update --remote --recursive
43-
git commit -am "chore: sync latest Apple MLX components"
59+
git commit -am "chore: sync latest Apple MLX components and re-graft SSD patches"
4460
git push origin main
4561
```
46-
4. Finally, your local `mlx-server` will automatically pull the updated `mlx-swift` package upon running `./build.sh` (or `swift package resolve`).
62+
63+
### 2.3 Validation Flow
64+
65+
Do not deploy binary updates to the inference engine without executing the extreme validation matrix.
66+
67+
1. **Clean Re-Build:** Always execute a destructive cache wipe before a Metal compilation test.
68+
```bash
69+
# In mlx-server framework
70+
rm -rf .build
71+
./build.sh
72+
```
73+
2. **Swift API Layer Verification:** Run the test suites within your wrapper to certify that the Swift `->` C `->` C++ bindings remain structurally unified.
74+
```bash
75+
cd LocalPackages/mlx-swift
76+
swift test
77+
```
78+
3. **Extreme Context Benchmarking (The Harness):**
79+
- Run the dedicated `/run-benchmark` workflow from the root `mlx-server` directory (utilizing `run_benchmark.sh` or `profile_runner.py`).
80+
- Specifically target models invoking >32k token contexts. High prompt generation latency, GPU thrashing, or hard Out-of-Memory (OOM) faults directly indicate that the Metal barrier (`fence.air`) or `ssd_streamer.mm` broke silently during the git rebase.
4781

4882
## 3. Triaging SSD-Stream Bugs
4983

0 commit comments

Comments
 (0)