Skip to content

feat: swap Tier 3 from GPT-OSS-20B to Qwen3.5-27B#573

Merged
Lightheartdevs merged 2 commits intomainfrom
feat/tier3-qwen35-27b
Mar 23, 2026
Merged

feat: swap Tier 3 from GPT-OSS-20B to Qwen3.5-27B#573
Lightheartdevs merged 2 commits intomainfrom
feat/tier3-qwen35-27b

Conversation

@Lightheartdevs
Copy link
Collaborator

Summary

GPT-OSS-20B broke Perplexica and any service using structured output / JSON mode. The model uses special tokens (<|start|>, <|channel|>, <|constrain|>) incompatible with llama.cpp's grammar-based JSON constraining. Pure chat worked great, structured output returned HTTP 500.

Swapping to Qwen3.5-27B (16.7GB Q4_K_M) — same family as Tier 1-2, proven llama.cpp compatibility, fits 20-39GB VRAM.

Test plan

  • Perplexica search queries return results (structured output works)
  • Open WebUI chat works
  • OpenClaw agent responds
  • bash tests/test-tier-map.sh passes

🤖 Generated with Claude Code

Lightheartdevs and others added 2 commits March 22, 2026 23:50
GPT-OSS-20B uses special tokens (<|start|>, <|channel|>, <|constrain|>)
for structured output that are incompatible with llama.cpp's JSON
grammar mode. This causes Perplexica (which uses generateObject) to
fail with HTTP 500 on every query. Pure chat inference worked fine
but structured output / tool calling was broken.

Qwen3.5-27B (16.7GB Q4_K_M) is the same model family as Tier 1-2
(Qwen 3.5), proven compatible with llama.cpp structured output,
and fits in 20-39GB VRAM tier.

Updated across all platforms, tests, agent templates, docs, and
disk estimation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Qwen3.5-27B-Q4_K_M.gguf not qwen3.5-27b-Q4_K_M.gguf — sed
lowercased it during the bulk replace.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Lightheartdevs Lightheartdevs merged commit 3077e66 into main Mar 23, 2026
14 of 20 checks passed
Lightheartdevs added a commit that referenced this pull request Mar 23, 2026
Follow-up to #573 — docs still referenced old Qwen3 8B/4B/14B models.
Updated to match current tier map:
- T1/T2/ARC: Qwen3.5 9B
- T3: Qwen3.5 27B
- ARC_LITE: Qwen3.5 4B

Files: root README, FAQ, INTEL-ARC-GUIDE, MACOS-QUICKSTART, SUPPORT-MATRIX

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lightheartdevs added a commit that referenced this pull request Mar 23, 2026
Follow-up to #573 — docs still referenced old Qwen3 8B/4B/14B models.
Updated to match current tier map:
- T1/T2/ARC: Qwen3.5 9B
- T3: Qwen3.5 27B
- ARC_LITE: Qwen3.5 4B

Files: root README, FAQ, INTEL-ARC-GUIDE, MACOS-QUICKSTART, SUPPORT-MATRIX

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant