The SolidWorks MCP UI can route all LLM calls to a local Ollama instance instead of GitHub Models or OpenAI. This works offline, keeps design data private, and uses Ollama's OpenAI-compatible endpoint at http://127.0.0.1:11434/v1.
The official Ollama Gemma 4 library page is here: https://ollama.com/library/gemma4
The current Ollama Gemma 4 tags relevant to this project are:
| Tier | Ollama tag | Context | Best for |
|---|---|---|---|
| Small | gemma4:e2b |
128K | CPU-friendly edge and smoke tests |
| Balanced | gemma4:e4b |
128K | Recommended default for local planning |
| Large | gemma4:26b |
256K | Workstation-class local evaluation |
| XL | gemma4:31b |
256K | Highest-cost local evaluation |
Notes from the Ollama page:
gemma4:e2bandgemma4:e4bare the edge variants.gemma4:26bandgemma4:31bare the workstation variants.- Gemma 4 exposes native
systemrole support and a much larger context window than earlier local defaults. - The repo currently auto-detects
small,balanced, andlarge;31bremains a manual override because it is above the current large-tier threshold.
!!! tip "Auto-selection"
The /api/ui/local-model/probe endpoint detects GPU VRAM and system RAM, then recommends one of the built-in Gemma 4 tiers.
# Download from https://ollama.com and install, then verify:
ollama serve# Let the backend pick for you based on your hardware:
# GET http://127.0.0.1:8766/api/ui/local-model/probe
# Then use the returned `pull_command`, for example:
ollama pull gemma4:e4bOr pull a specific tier directly:
ollama pull gemma4:e2b # small
ollama pull gemma4:e4b # balanced (recommended)
ollama pull gemma4:26b # large
ollama pull gemma4:31b # manual high-end overrideIn Design Spec and Model Settings:
- Click
Provider: Local. - Click
Auto-Detect Local Model. - Review the recommended tier, endpoint, and pull command shown under the model controls.
- If the model is not downloaded yet, click
Pull Recommended Model. - Run
Auto-Detect Local Modelagain to refresh availability, then retry Clarify or Inspect.
This is the intended recovery path for errors like:
model 'llama3.1' not found
Set the model in your environment before starting the UI server:
# Option A — environment variable for the current shell
$env:SOLIDWORKS_UI_MODEL = "local:gemma4:e4b"
.\run-ui.ps1
# Option B — one-line launch
$env:SOLIDWORKS_UI_MODEL = "local:gemma4:e4b"; .\run-ui.ps1Available SOLIDWORKS_UI_MODEL values for local inference:
| Value | Tier |
|---|---|
local:gemma4:e2b |
small |
local:gemma4:e4b |
balanced |
local:gemma4:26b |
large |
local:gemma4:31b |
manual override |
If you run Ollama on a different host or port:
$env:SOLIDWORKS_UI_OLLAMA_ENDPOINT = "http://my-gpu-server:11434"
$env:SOLIDWORKS_UI_LOCAL_ENDPOINT = "http://my-gpu-server:11434/v1"Returns hardware info and the recommended model tier.
{
"available": true,
"endpoint": "http://127.0.0.1:11434",
"tier": "balanced",
"ollama_model": "gemma4:e4b",
"service_model": "local:gemma4:e4b",
"label": "Gemma 4 E4B (balanced — 8 GB VRAM)",
"vram_gb": 10.8,
"ram_gb": 32.0,
"pulled_models": ["gemma4:e4b"],
"tier_already_pulled": true,
"pull_command": "ollama pull gemma4:e4b",
"status_message": "Ready: Gemma 4 E4B (balanced — 8 GB VRAM) is loaded in Ollama."
}Pull a model into Ollama. Body: {"model": "gemma4:e4b"}.
{ "queued": true, "model": "gemma4:e4b" }Ollama is not running
: Start Ollama with ollama serve or ensure the desktop app is running.
tier_already_pulled: false
: Run the pull_command shown in the probe response to download the recommended tag.
Slow generation
: Use the small tier gemma4:e2b for CPU-bound or constrained-memory systems.
VRAM detected as 0
: CUDA drivers may be unavailable or the machine may be using an iGPU. The small tier will still run, but more slowly.