diff --git a/README.md b/README.md index 489501c9..4a6e9b58 100644 --- a/README.md +++ b/README.md @@ -162,9 +162,9 @@ The installer detects your GPU and picks the optimal model automatically. No man | VRAM | Model | Example GPUs | |------|-------|--------------| | < 8 GB | Qwen3.5 2B (Q4_K_M) | Any GPU or CPU-only | -| 8–11 GB | Qwen3 8B (Q4_K_M) | RTX 4060 Ti, RTX 3060 12GB | -| 12–20 GB | Qwen3 8B (Q4_K_M) | RTX 3090, RTX 4080 | -| 20–40 GB | Qwen3 14B (Q4_K_M) | RTX 4090, A6000 | +| 8–11 GB | Qwen3.5 9B (Q4_K_M) | RTX 4060 Ti, RTX 3060 12GB | +| 12–20 GB | Qwen3.5 9B (Q4_K_M) | RTX 3090, RTX 4080 | +| 20–40 GB | Qwen3.5 27B (Q4_K_M) | RTX 4090, A6000 | | 40+ GB | Qwen3 30B-A3B (MoE, Q4_K_M) | A100, multi-GPU | | 90+ GB | Qwen3 Coder Next (80B MoE, Q4_K_M) | Multi-GPU A100/H100 | @@ -180,8 +180,8 @@ The installer detects your GPU and picks the optimal model automatically. No man | Unified RAM | Model | Example Hardware | |-------------|-------|-----------------| | < 16 GB | Qwen3.5 2B (Q4_K_M) | M1/M2 base (8GB) | -| 16–24 GB | Qwen3 4B (Q4_K_M) | M4 Mac Mini (16GB) | -| 32 GB | Qwen3 8B (Q4_K_M) | M4 Pro Mac Mini, M3 Max MacBook Pro | +| 16–24 GB | Qwen3.5 4B (Q4_K_M) | M4 Mac Mini (16GB) | +| 32 GB | Qwen3.5 9B (Q4_K_M) | M4 Pro Mac Mini, M3 Max MacBook Pro | | 48 GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M4 Pro (48GB), M2 Max (48GB) | | 64+ GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M2 Ultra Mac Studio, M4 Max (64GB+) | diff --git a/dream-server/FAQ.md b/dream-server/FAQ.md index 0aa1464c..507977f7 100644 --- a/dream-server/FAQ.md +++ b/dream-server/FAQ.md @@ -98,7 +98,7 @@ Use the `dream` CLI: ```bash dream model current # See what's running dream model list # Show available tiers and models -dream model swap T3 # Switch to Tier 3 (e.g., Qwen3 14B) +dream model swap T3 # Switch to Tier 3 (e.g., Qwen3.5 27B) ``` The model file must already be downloaded. If it isn't, pre-fetch it first: @@ -123,9 +123,9 @@ The installer auto-selects based on your GPU, but you can switch between any tie | Tier | Model | Min VRAM | |------|-------|----------| -| T1 | Qwen3 8B | 8 GB | -| T2 | Qwen3 8B | 12 GB | -| T3 | Qwen3 14B | 20 GB | +| T1 | Qwen3.5 9B | 8 GB | +| T2 | Qwen3.5 9B | 12 GB | +| T3 | Qwen3.5 27B | 20 GB | | T4 | Qwen3 30B-A3B (MoE) | 40 GB | | SH_COMPACT | Qwen3 30B-A3B (MoE) | 64 GB unified | | SH_LARGE | Qwen3 Coder Next 80B (MoE) | 90 GB unified | diff --git a/dream-server/docs/INTEL-ARC-GUIDE.md b/dream-server/docs/INTEL-ARC-GUIDE.md index 1f1054fa..f253df5d 100644 --- a/dream-server/docs/INTEL-ARC-GUIDE.md +++ b/dream-server/docs/INTEL-ARC-GUIDE.md @@ -14,18 +14,18 @@ known limitations, and performance expectations. | GPU | VRAM | Estimated tok/s | Concurrent users | Model | |-----|------|----------------|-----------------|-------| -| Arc A770 | 16 GB | ~35 | 3–5 | Qwen3 8B Q4\_K\_M | -| Arc B580 | 12 GB | ~30 | 2–4 | Qwen3 8B Q4\_K\_M | +| Arc A770 | 16 GB | ~35 | 3–5 | Qwen3.5 9B Q4\_K\_M | +| Arc B580 | 12 GB | ~30 | 2–4 | Qwen3.5 9B Q4\_K\_M | ### Tier: ARC\_LITE (< 12 GB VRAM) | GPU | VRAM | Estimated tok/s | Concurrent users | Model | |-----|------|----------------|-----------------|-------| -| Arc A750 | 8 GB | ~20 | 1–2 | Qwen3 4B Q4\_K\_M | -| Arc A380 | 6 GB | ~15 | 1 | Qwen3 4B Q4\_K\_M | -| Arc A310 | 4 GB | ~10 | 1 | Qwen3 4B Q4\_K\_M (tight) | +| Arc A750 | 8 GB | ~20 | 1–2 | Qwen3.5 4B Q4\_K\_M | +| Arc A380 | 6 GB | ~15 | 1 | Qwen3.5 4B Q4\_K\_M | +| Arc A310 | 4 GB | ~10 | 1 | Qwen3.5 4B Q4\_K\_M (tight) | -> **A310 note:** 4 GB VRAM is borderline for Qwen3 4B Q4\_K\_M (~3.3 GB). +> **A310 note:** 4 GB VRAM is borderline for Qwen3.5 4B Q4\_K\_M (~3.3 GB). > The model will load but leaves little headroom for KV cache. > Consider `--ctx-size 4096` (set `CTX_SIZE=4096` in `.env`) to reduce pressure. @@ -187,9 +187,9 @@ Performance figures below are measured with Qwen3 models at Q4\_K\_M quantisatio | GPU | Model | Prompt tok/s | Generate tok/s | Notes | |-----|-------|------------|----------------|-------| -| Arc A770 (16 GB) | Qwen3 8B Q4\_K\_M | ~120 | ~35 | Comfortable fit; KV cache well within VRAM | -| Arc A750 (8 GB) | Qwen3 4B Q4\_K\_M | ~90 | ~20 | Model fits; limit `CTX_SIZE` to ≤ 16384 | -| Arc A380 (6 GB) | Qwen3 4B Q4\_K\_M | ~70 | ~15 | Tight. Set `CTX_SIZE=8192` for safety | +| Arc A770 (16 GB) | Qwen3.5 9B Q4\_K\_M | ~120 | ~35 | Comfortable fit; KV cache well within VRAM | +| Arc A750 (8 GB) | Qwen3.5 4B Q4\_K\_M | ~90 | ~20 | Model fits; limit `CTX_SIZE` to ≤ 16384 | +| Arc A380 (6 GB) | Qwen3.5 4B Q4\_K\_M | ~70 | ~15 | Tight. Set `CTX_SIZE=8192` for safety | ### Comparison to equivalent NVIDIA tiers diff --git a/dream-server/docs/MACOS-QUICKSTART.md b/dream-server/docs/MACOS-QUICKSTART.md index cd11bfa6..79074abd 100644 --- a/dream-server/docs/MACOS-QUICKSTART.md +++ b/dream-server/docs/MACOS-QUICKSTART.md @@ -89,8 +89,8 @@ The installer auto-selects the best model for your unified memory: | Unified RAM | Tier | Model | Context | |-------------|------|-------|---------| -| 8–24 GB | 1 | Qwen3 4B (Q4_K_M) | 16384 | -| 32 GB | 2 | Qwen3 8B (Q4_K_M) | 32768 | +| 8–24 GB | 1 | Qwen3.5 4B (Q4_K_M) | 16384 | +| 32 GB | 2 | Qwen3.5 9B (Q4_K_M) | 32768 | | 48 GB | 3 | Qwen3 30B-A3B (MoE, Q4_K_M) | 32768 | | 64+ GB | 4 | Qwen3 30B-A3B (MoE, Q4_K_M) | 131072 | diff --git a/dream-server/docs/SUPPORT-MATRIX.md b/dream-server/docs/SUPPORT-MATRIX.md index 7152bfd0..988d0823 100644 --- a/dream-server/docs/SUPPORT-MATRIX.md +++ b/dream-server/docs/SUPPORT-MATRIX.md @@ -38,11 +38,11 @@ Last updated: 2026-03-17 | `SH_LARGE` | AMD Strix Halo 90+ | Qwen3-Coder-Next | ≥ 90 GB (unified) | ROCm | | `SH_COMPACT` | AMD Strix Halo < 90 GB | Qwen3 30B A3B | < 90 GB (unified) | ROCm | | `4` | NVIDIA 40 GB+ / multi-GPU | Qwen3 30B A3B | ≥ 40 GB | CUDA | -| `3` | NVIDIA 20 GB+ | Qwen3 14B | ≥ 20 GB | CUDA | -| `ARC` | **Intel Arc ≥ 12 GB** (A770, B580) | Qwen3 8B | ≥ 12 GB | **SYCL** | -| `2` | NVIDIA 12 GB+ | Qwen3 8B | ≥ 12 GB | CUDA | -| `ARC_LITE` | **Intel Arc < 12 GB** (A750, A380) | Qwen3 4B | 6–11 GB | **SYCL** | -| `1` | NVIDIA 4 GB+ | Qwen3 8B | ≥ 4 GB | CUDA | +| `3` | NVIDIA 20 GB+ | Qwen3.5 27B | ≥ 20 GB | CUDA | +| `ARC` | **Intel Arc ≥ 12 GB** (A770, B580) | Qwen3.5 9B | ≥ 12 GB | **SYCL** | +| `2` | NVIDIA 12 GB+ | Qwen3.5 9B | ≥ 12 GB | CUDA | +| `ARC_LITE` | **Intel Arc < 12 GB** (A750, A380) | Qwen3.5 4B | 6–11 GB | **SYCL** | +| `1` | NVIDIA 4 GB+ | Qwen3.5 9B | ≥ 4 GB | CUDA | | `0` | CPU / < 4 GB GPU | Qwen3.5 2B | any | CPU | | `CLOUD` | No local GPU | Claude (API) | — | LiteLLM |