You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-21Lines changed: 15 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ No Python runtime, no Global Interpreter Lock (GIL), no unnecessary memory copie
12
12
13
13
## 📊 Performance: Gemma 4-26B on Apple Silicon
14
14
15
-
Benchmark results for `gemma-4-26b-a4b-it-4bit` (26B MoE, 4-bit) on M5 Pro 64 GB. Full results and methodology at **[profiling_results.md](profiling_results.md)**.
15
+
Benchmark results for `gemma-4-26b-a4b-it-4bit` (26B MoE, 4-bit) on M5 Pro 64 GB.
16
16
17
17
### Headline Numbers
18
18
@@ -32,12 +32,10 @@ Benchmark results for `gemma-4-26b-a4b-it-4bit` (26B MoE, 4-bit) on M5 Pro 64 GB
32
32
33
33
Run the benchmark on your device:
34
34
```bash
35
-
python3 scripts/profiling/profile_runner.py \
36
-
--model gemma-4-26b-a4b-it-4bit \
37
-
--contexts "512,40000,100000"
35
+
./run_benchmark.sh
38
36
```
39
37
40
-
> We welcome PRs with results from other Apple Silicon devices! See [profiling_results.md](profiling_results.md#contributing-your-results) for details.
38
+
> The interactive script lets you pick a model and context sizes. Results are saved to `profiling_results_<hostname>.md` with a rich console visualization.
41
39
42
40
---
43
41
@@ -155,41 +153,39 @@ Then in Xcode:
155
153
### Fastest: Download Pre-built Binary
156
154
157
155
Download the latest release tarball from the [Releases page](https://github.com/SharpAI/SwiftLM/releases).
158
-
The archive is **self-contained** — `default.metallib` is bundled alongside the binary.
156
+
The archive is **self-contained** — `mlx.metallib` is bundled alongside the binary.
159
157
160
158
```bash
161
159
tar -xzf SwiftLM-<version>-macos-arm64.tar.gz
162
-
163
-
# Run from the extracted directory — default.metallib must be co-located with the binary
> **⚠️ Metal GPU Error?** If you see `Failed to load the default metallib`, it means `default.metallib` is missing from the directory you are running `SwiftLM`from. Make sure you run the binary**from the extracted folder** and do not move the binary without also moving `default.metallib` alongside it.
163
+
> **⚠️ Metal GPU Error?** If you see `Failed to load the default metallib`, make sure `mlx.metallib` is co-located with the `SwiftLM` binary.
168
164
169
165
### Build from Source
170
166
167
+
The build script handles everything: submodules, cmake, Metal kernel compilation, and the Swift build.
168
+
171
169
```bash
172
-
# Must clone recursively — default.metallib ships inside the mlx-swift submodule
`default.metallib` is a pre-built artifact inside the `mlx-swift` submodule, version-matched to the Swift binary. Copy it next to the binary before running:
175
+
This will:
176
+
1. Initialize git submodules
177
+
2. Install `cmake` via Homebrew (if not already installed)
178
+
3. Compile `mlx.metallib` from the Metal kernel sources
> **⚠️ Do NOT use Python's `mlx-metal` package as a source for `mlx.metallib`.**
191
-
> While `uv run --with mlx-metal python -c "...shutil.copy(metallib, ...)"` will get the server to start, the pip `mlx-metal` package is a **different version** of MLX than what this binary was compiled against. The version mismatch causes GPU kernel ABI corruption during inference, producing a `freed pointer was not the last allocation` crash. Always use the metallib from `LocalPackages/mlx-swift/` — it is the only version-matched artifact for this build.
192
-
193
189
*(Add `--stream-experts` when running oversized MoE models like Qwen3.5 122B to bypass macOS virtual memory swapping and stream expert layers directly from NVMe.)*
- Metal Toolchain (`xcodebuild -downloadComponent MetalToolchain`)
243
239
244
-
## 📖 Development Journal & The "Aha!" Moment
245
-
246
-
The stabilization of the Gemma 4 inference engine on Apple Silicon is fully chronicled in our [Development Journal](journal.md).
240
+
## 📖 The "Aha!" Moment
247
241
248
242
**The "2+2=4" Aha Moment**: During development, we encountered a severe "silent failure" where the model would successfully load and evaluate all 32 layers at high speed, but generate nothing but infinite whitespace. The model logits showed the correct *shape* but the wrong *magnitudes*.
0 commit comments