Skip to content

running-locally: refresh for kestrel 0.3.0 (Apple Silicon + Blackwell + Windows + Jetson Thor)#12

Merged
vikhyat merged 5 commits into
mainfrom
docs/kestrel-0.3.0
May 1, 2026
Merged

running-locally: refresh for kestrel 0.3.0 (Apple Silicon + Blackwell + Windows + Jetson Thor)#12
vikhyat merged 5 commits into
mainfrom
docs/kestrel-0.3.0

Conversation

@vikhyat
Copy link
Copy Markdown
Contributor

@vikhyat vikhyat commented May 1, 2026

Summary

Refreshes `docs/running-locally.md` for the kestrel 0.3.0 / Photon release. The previous version was NVIDIA-Ampere-through-Hopper-only and Jetson-Orin-only — kestrel 0.3.0 ships four new platforms simultaneously.

What changed

  • Lead + Requirements: split into two paths (NVIDIA GPU / Apple Silicon Mac), Python version note for macOS=3.12.
  • "Supported Hardware" (renamed from "Supported GPUs"): four subsections — NVIDIA Server / Datacenter (B200 added), NVIDIA Workstation / Desktop (RTX PRO 6000 called out), Apple Silicon (new table), NVIDIA Edge (Jetson Thor added).
  • New "Apple Silicon Setup" section — `pip install moondream` is everything you need on a stock M-series Mac with Python 3.12.
  • "Jetson Setup": split into Thor (JP7, standard PyPI torch + venv `nvidia-cu13` / `nvpl` lib paths) and Orin (JP6, NVIDIA's custom torch wheel + `/usr/local/cuda`). Verified the JP7 instructions on a real Jetson Thor box this session.
  • "Performance" section: replaces the H100-only sentence with B200 / H100 / M5 Max headline numbers from `PERFORMANCE.md`.
  • "Hugging Face Transformers" footnote: updated wording — Apple Silicon is no longer characterized as "non-NVIDIA hardware where Photon doesn't work."

Cross-PR coordination

Test plan

  • Local Docusaurus build (`npm run start`) renders the page without lint or link errors.
  • Visual scan: tables render correctly, code blocks intact.

vikhyat added 5 commits May 1, 2026 06:05
Photon (kestrel) 0.3.0 ships on four new platforms beyond Linux
NVIDIA Ampere–Hopper: Apple Silicon (M-series Macs), Windows AMD64,
NVIDIA Blackwell (B200 + RTX PRO 6000), and Jetson Thor (JP7).

This rewrite:
- Reframes the lead and Requirements section to cover both NVIDIA
  GPU and Apple Silicon paths.
- Renames the supported-hardware section to 'Supported Hardware'
  and splits it into NVIDIA Server / Datacenter, NVIDIA Workstation
  / Desktop, Apple Silicon, and NVIDIA Edge subsections — adds B200
  to Datacenter, RTX PRO 6000 to Workstation, an Apple Silicon
  table, and Thor (JP7) to Edge.
- Adds an 'Apple Silicon Setup' section (one-line install).
- Restructures 'Jetson Setup' into JP7 / Thor (standard PyPI torch +
  venv-resident nvidia-cu13 + nvpl libs) and JP6 / Orin (NVIDIA's
  custom torch wheel + /usr/local/cuda LD_LIBRARY_PATH path).
- Updates the Performance section to call out concrete B200, H100,
  and M5 Max throughput numbers.
- Updates the 'Hugging Face Transformers' footnote so Apple Silicon
  is no longer characterized as 'non-NVIDIA hardware where Photon
  doesn't work' — Photon does work on Apple Silicon now.
Round of polish on the previous commit's running-locally rewrite:

- Drop the standalone 'Apple Silicon Setup' section. It just
  repeated 'pip install moondream' under a new heading and ran a
  `print(md.__version__)` verify that proves nothing about
  whether Photon's Metal backend actually loaded. The supported-
  hardware Apple Silicon table already covers the platform; the
  one extra fact ('no NVIDIA CUDA, no Triton, no extra setup
  beyond pip install') folds in there.
- Merge 'Server / Datacenter' and 'Workstation / Desktop' into
  one 'NVIDIA GPU' table. The single-row Workstation table
  ('RTX PRO 6000 96 GB') read as a content stub.
- Tighten the Thor LD_LIBRARY_PATH bash. The previous
  `find | grep -E '/(nvidia|nvpl)/' | tr | sed` pipeline was
  overkill — just point at the three actual lib dirs torch needs
  (verified on a real Thor box: `nvidia/cu13/lib`, `nvidia/cudnn/lib`,
  `nvpl/lib`).
- Replace the multi-line `python3 -c"..."` Jetson verify block
  with a one-liner that prints torch version + GPU name, plus a
  plain-English note about what to expect ('NVIDIA Thor' / 'Orin')
  and what to fix if it doesn't work.
- Replace the prose Performance bullets with a comparison table
  pulling concrete numbers from PERFORMANCE.md (B200, H100,
  RTX PRO 6000, M5 Max — req/s for both Moondream 2 and 3).
The kestrel docs are the authoritative source for the Jetson install
flow (it's the package shipping the LD_LIBRARY_PATH / NVIDIA-CUDA
plumbing). The customer-facing version here covers the common path
intentionally tightly; for extra troubleshooting bits (cuSPARSELt
errors, missing cuda-cupti / libnvtoolsext1 on minimal images, etc.),
point readers at the canonical kestrel guide.
The Moondream 3 section says only NVIDIA GPUs with 24GB+ are
supported, with 'quantized and Apple Silicon versions coming soon.'
True for the Transformers path, but a reader skimming might come away
thinking MD3 doesn't run on Apple Silicon at all. Photon (kestrel
0.3.0) does support MD3 on M-series Macs with >=24GB unified memory.
Adjust the wording to scope the 'NVIDIA only' claim to the Transformers
route specifically, and point readers at Photon for the Apple Silicon
path.
The Test deployment workflow was pinned at Node 18, while prod
deploy.yml uses Node 24.2.0. `@easyops-cn/docusaurus-search-local`
references the `File` global (added in Node 20), so the test job
has been failing on every PR since the plugin started using it
(visible on PR #6 in April). Bringing test-deploy to the same
node-version as prod fixes the test job and removes the prod-vs-
test-CI version skew.
@vikhyat vikhyat merged commit 7ed7ec0 into main May 1, 2026
1 check passed
@vikhyat vikhyat deleted the docs/kestrel-0.3.0 branch May 1, 2026 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant