Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,9 +162,9 @@ The installer detects your GPU and picks the optimal model automatically. No man
| VRAM | Model | Example GPUs |
|------|-------|--------------|
| < 8 GB | Qwen3.5 2B (Q4_K_M) | Any GPU or CPU-only |
| 8–11 GB | Qwen3 8B (Q4_K_M) | RTX 4060 Ti, RTX 3060 12GB |
| 12–20 GB | Qwen3 8B (Q4_K_M) | RTX 3090, RTX 4080 |
| 20–40 GB | Qwen3 14B (Q4_K_M) | RTX 4090, A6000 |
| 8–11 GB | Qwen3.5 9B (Q4_K_M) | RTX 4060 Ti, RTX 3060 12GB |
| 12–20 GB | Qwen3.5 9B (Q4_K_M) | RTX 3090, RTX 4080 |
| 20–40 GB | Qwen3.5 27B (Q4_K_M) | RTX 4090, A6000 |
| 40+ GB | Qwen3 30B-A3B (MoE, Q4_K_M) | A100, multi-GPU |
| 90+ GB | Qwen3 Coder Next (80B MoE, Q4_K_M) | Multi-GPU A100/H100 |

Expand All @@ -180,8 +180,8 @@ The installer detects your GPU and picks the optimal model automatically. No man
| Unified RAM | Model | Example Hardware |
|-------------|-------|-----------------|
| < 16 GB | Qwen3.5 2B (Q4_K_M) | M1/M2 base (8GB) |
| 16–24 GB | Qwen3 4B (Q4_K_M) | M4 Mac Mini (16GB) |
| 32 GB | Qwen3 8B (Q4_K_M) | M4 Pro Mac Mini, M3 Max MacBook Pro |
| 16–24 GB | Qwen3.5 4B (Q4_K_M) | M4 Mac Mini (16GB) |
| 32 GB | Qwen3.5 9B (Q4_K_M) | M4 Pro Mac Mini, M3 Max MacBook Pro |
| 48 GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M4 Pro (48GB), M2 Max (48GB) |
| 64+ GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M2 Ultra Mac Studio, M4 Max (64GB+) |

Expand Down
8 changes: 4 additions & 4 deletions dream-server/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ Use the `dream` CLI:
```bash
dream model current # See what's running
dream model list # Show available tiers and models
dream model swap T3 # Switch to Tier 3 (e.g., Qwen3 14B)
dream model swap T3 # Switch to Tier 3 (e.g., Qwen3.5 27B)
```

The model file must already be downloaded. If it isn't, pre-fetch it first:
Expand All @@ -123,9 +123,9 @@ The installer auto-selects based on your GPU, but you can switch between any tie

| Tier | Model | Min VRAM |
|------|-------|----------|
| T1 | Qwen3 8B | 8 GB |
| T2 | Qwen3 8B | 12 GB |
| T3 | Qwen3 14B | 20 GB |
| T1 | Qwen3.5 9B | 8 GB |
| T2 | Qwen3.5 9B | 12 GB |
| T3 | Qwen3.5 27B | 20 GB |
| T4 | Qwen3 30B-A3B (MoE) | 40 GB |
| SH_COMPACT | Qwen3 30B-A3B (MoE) | 64 GB unified |
| SH_LARGE | Qwen3 Coder Next 80B (MoE) | 90 GB unified |
Expand Down
18 changes: 9 additions & 9 deletions dream-server/docs/INTEL-ARC-GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,18 @@ known limitations, and performance expectations.

| GPU | VRAM | Estimated tok/s | Concurrent users | Model |
|-----|------|----------------|-----------------|-------|
| Arc A770 | 16 GB | ~35 | 3–5 | Qwen3 8B Q4\_K\_M |
| Arc B580 | 12 GB | ~30 | 2–4 | Qwen3 8B Q4\_K\_M |
| Arc A770 | 16 GB | ~35 | 3–5 | Qwen3.5 9B Q4\_K\_M |
| Arc B580 | 12 GB | ~30 | 2–4 | Qwen3.5 9B Q4\_K\_M |

### Tier: ARC\_LITE (< 12 GB VRAM)

| GPU | VRAM | Estimated tok/s | Concurrent users | Model |
|-----|------|----------------|-----------------|-------|
| Arc A750 | 8 GB | ~20 | 1–2 | Qwen3 4B Q4\_K\_M |
| Arc A380 | 6 GB | ~15 | 1 | Qwen3 4B Q4\_K\_M |
| Arc A310 | 4 GB | ~10 | 1 | Qwen3 4B Q4\_K\_M (tight) |
| Arc A750 | 8 GB | ~20 | 1–2 | Qwen3.5 4B Q4\_K\_M |
| Arc A380 | 6 GB | ~15 | 1 | Qwen3.5 4B Q4\_K\_M |
| Arc A310 | 4 GB | ~10 | 1 | Qwen3.5 4B Q4\_K\_M (tight) |

> **A310 note:** 4 GB VRAM is borderline for Qwen3 4B Q4\_K\_M (~3.3 GB).
> **A310 note:** 4 GB VRAM is borderline for Qwen3.5 4B Q4\_K\_M (~3.3 GB).
> The model will load but leaves little headroom for KV cache.
> Consider `--ctx-size 4096` (set `CTX_SIZE=4096` in `.env`) to reduce pressure.

Expand Down Expand Up @@ -187,9 +187,9 @@ Performance figures below are measured with Qwen3 models at Q4\_K\_M quantisatio

| GPU | Model | Prompt tok/s | Generate tok/s | Notes |
|-----|-------|------------|----------------|-------|
| Arc A770 (16 GB) | Qwen3 8B Q4\_K\_M | ~120 | ~35 | Comfortable fit; KV cache well within VRAM |
| Arc A750 (8 GB) | Qwen3 4B Q4\_K\_M | ~90 | ~20 | Model fits; limit `CTX_SIZE` to ≤ 16384 |
| Arc A380 (6 GB) | Qwen3 4B Q4\_K\_M | ~70 | ~15 | Tight. Set `CTX_SIZE=8192` for safety |
| Arc A770 (16 GB) | Qwen3.5 9B Q4\_K\_M | ~120 | ~35 | Comfortable fit; KV cache well within VRAM |
| Arc A750 (8 GB) | Qwen3.5 4B Q4\_K\_M | ~90 | ~20 | Model fits; limit `CTX_SIZE` to ≤ 16384 |
| Arc A380 (6 GB) | Qwen3.5 4B Q4\_K\_M | ~70 | ~15 | Tight. Set `CTX_SIZE=8192` for safety |

### Comparison to equivalent NVIDIA tiers

Expand Down
4 changes: 2 additions & 2 deletions dream-server/docs/MACOS-QUICKSTART.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,8 @@ The installer auto-selects the best model for your unified memory:

| Unified RAM | Tier | Model | Context |
|-------------|------|-------|---------|
| 8–24 GB | 1 | Qwen3 4B (Q4_K_M) | 16384 |
| 32 GB | 2 | Qwen3 8B (Q4_K_M) | 32768 |
| 8–24 GB | 1 | Qwen3.5 4B (Q4_K_M) | 16384 |
| 32 GB | 2 | Qwen3.5 9B (Q4_K_M) | 32768 |
| 48 GB | 3 | Qwen3 30B-A3B (MoE, Q4_K_M) | 32768 |
| 64+ GB | 4 | Qwen3 30B-A3B (MoE, Q4_K_M) | 131072 |

Expand Down
10 changes: 5 additions & 5 deletions dream-server/docs/SUPPORT-MATRIX.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,11 @@ Last updated: 2026-03-17
| `SH_LARGE` | AMD Strix Halo 90+ | Qwen3-Coder-Next | ≥ 90 GB (unified) | ROCm |
| `SH_COMPACT` | AMD Strix Halo < 90 GB | Qwen3 30B A3B | < 90 GB (unified) | ROCm |
| `4` | NVIDIA 40 GB+ / multi-GPU | Qwen3 30B A3B | ≥ 40 GB | CUDA |
| `3` | NVIDIA 20 GB+ | Qwen3 14B | ≥ 20 GB | CUDA |
| `ARC` | **Intel Arc ≥ 12 GB** (A770, B580) | Qwen3 8B | ≥ 12 GB | **SYCL** |
| `2` | NVIDIA 12 GB+ | Qwen3 8B | ≥ 12 GB | CUDA |
| `ARC_LITE` | **Intel Arc < 12 GB** (A750, A380) | Qwen3 4B | 6–11 GB | **SYCL** |
| `1` | NVIDIA 4 GB+ | Qwen3 8B | ≥ 4 GB | CUDA |
| `3` | NVIDIA 20 GB+ | Qwen3.5 27B | ≥ 20 GB | CUDA |
| `ARC` | **Intel Arc ≥ 12 GB** (A770, B580) | Qwen3.5 9B | ≥ 12 GB | **SYCL** |
| `2` | NVIDIA 12 GB+ | Qwen3.5 9B | ≥ 12 GB | CUDA |
| `ARC_LITE` | **Intel Arc < 12 GB** (A750, A380) | Qwen3.5 4B | 6–11 GB | **SYCL** |
| `1` | NVIDIA 4 GB+ | Qwen3.5 9B | ≥ 4 GB | CUDA |
| `0` | CPU / < 4 GB GPU | Qwen3.5 2B | any | CPU |
| `CLOUD` | No local GPU | Claude (API) | — | LiteLLM |

Expand Down
Loading