Skip to content

Commit e978096

Browse files
committed
Update README for new build workflow and change Qwen2.5 to 3.5 in benchmark menu
1 parent c199866 commit e978096

2 files changed

Lines changed: 17 additions & 23 deletions

File tree

README.md

Lines changed: 15 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ No Python runtime, no Global Interpreter Lock (GIL), no unnecessary memory copie
1212

1313
## 📊 Performance: Gemma 4-26B on Apple Silicon
1414

15-
Benchmark results for `gemma-4-26b-a4b-it-4bit` (26B MoE, 4-bit) on M5 Pro 64 GB. Full results and methodology at **[profiling_results.md](profiling_results.md)**.
15+
Benchmark results for `gemma-4-26b-a4b-it-4bit` (26B MoE, 4-bit) on M5 Pro 64 GB.
1616

1717
### Headline Numbers
1818

@@ -32,12 +32,10 @@ Benchmark results for `gemma-4-26b-a4b-it-4bit` (26B MoE, 4-bit) on M5 Pro 64 GB
3232

3333
Run the benchmark on your device:
3434
```bash
35-
python3 scripts/profiling/profile_runner.py \
36-
--model gemma-4-26b-a4b-it-4bit \
37-
--contexts "512,40000,100000"
35+
./run_benchmark.sh
3836
```
3937

40-
> We welcome PRs with results from other Apple Silicon devices! See [profiling_results.md](profiling_results.md#contributing-your-results) for details.
38+
> The interactive script lets you pick a model and context sizes. Results are saved to `profiling_results_<hostname>.md` with a rich console visualization.
4139
4240
---
4341

@@ -155,41 +153,39 @@ Then in Xcode:
155153
### Fastest: Download Pre-built Binary
156154

157155
Download the latest release tarball from the [Releases page](https://github.com/SharpAI/SwiftLM/releases).
158-
The archive is **self-contained**`default.metallib` is bundled alongside the binary.
156+
The archive is **self-contained**`mlx.metallib` is bundled alongside the binary.
159157

160158
```bash
161159
tar -xzf SwiftLM-<version>-macos-arm64.tar.gz
162-
163-
# Run from the extracted directory — default.metallib must be co-located with the binary
164160
./SwiftLM --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413
165161
```
166162

167-
> **⚠️ Metal GPU Error?** If you see `Failed to load the default metallib`, it means `default.metallib` is missing from the directory you are running `SwiftLM` from. Make sure you run the binary **from the extracted folder** and do not move the binary without also moving `default.metallib` alongside it.
163+
> **⚠️ Metal GPU Error?** If you see `Failed to load the default metallib`, make sure `mlx.metallib` is co-located with the `SwiftLM` binary.
168164
169165
### Build from Source
170166

167+
The build script handles everything: submodules, cmake, Metal kernel compilation, and the Swift build.
168+
171169
```bash
172-
# Must clone recursively — default.metallib ships inside the mlx-swift submodule
173170
git clone --recursive https://github.com/SharpAI/SwiftLM
174171
cd SwiftLM
175-
swift build -c release
172+
./build.sh
176173
```
177174

178-
`default.metallib` is a pre-built artifact inside the `mlx-swift` submodule, version-matched to the Swift binary. Copy it next to the binary before running:
175+
This will:
176+
1. Initialize git submodules
177+
2. Install `cmake` via Homebrew (if not already installed)
178+
3. Compile `mlx.metallib` from the Metal kernel sources
179+
4. Build the `SwiftLM` binary in release mode
179180

181+
Then run:
180182
```bash
181-
cp LocalPackages/mlx-swift/Source/Cmlx/mlx/mlx/backend/metal/kernels/default.metallib \
182-
.build/release/
183-
184183
.build/release/SwiftLM \
185184
--model mlx-community/Qwen3.5-122B-A10B-4bit \
186185
--stream-experts \
187186
--port 5413
188187
```
189188

190-
> **⚠️ Do NOT use Python's `mlx-metal` package as a source for `mlx.metallib`.**
191-
> While `uv run --with mlx-metal python -c "...shutil.copy(metallib, ...)"` will get the server to start, the pip `mlx-metal` package is a **different version** of MLX than what this binary was compiled against. The version mismatch causes GPU kernel ABI corruption during inference, producing a `freed pointer was not the last allocation` crash. Always use the metallib from `LocalPackages/mlx-swift/` — it is the only version-matched artifact for this build.
192-
193189
*(Add `--stream-experts` when running oversized MoE models like Qwen3.5 122B to bypass macOS virtual memory swapping and stream expert layers directly from NVMe.)*
194190

195191
---
@@ -241,9 +237,7 @@ curl http://localhost:5413/v1/chat/completions \
241237
- Xcode Command Line Tools
242238
- Metal Toolchain (`xcodebuild -downloadComponent MetalToolchain`)
243239

244-
## 📖 Development Journal & The "Aha!" Moment
245-
246-
The stabilization of the Gemma 4 inference engine on Apple Silicon is fully chronicled in our [Development Journal](journal.md).
240+
## 📖 The "Aha!" Moment
247241

248242
**The "2+2=4" Aha Moment**: During development, we encountered a severe "silent failure" where the model would successfully load and evaluate all 32 layers at high speed, but generate nothing but infinite whitespace. The model logits showed the correct *shape* but the wrong *magnitudes*.
249243

run_benchmark.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ PS3="Select a model to benchmark (1-7): "
1212
options=(
1313
"gemma-4-26b-a4b-it-4bit"
1414
"gemma-4-2b-a4b-it-4bit"
15-
"Qwen2.5-7B-Instruct-4bit"
16-
"Qwen2.5-14B-Instruct-4bit"
15+
"Qwen3.5-7B-Instruct-4bit"
16+
"Qwen3.5-14B-Instruct-4bit"
1717
"phi-4-mlx-4bit"
1818
"Custom (Enter your own Hub ID)"
1919
"Quit"

0 commit comments

Comments
 (0)