v2.3.0 — Token Management Overhaul
Token Reduction Improvements
Based on deep comparison with Claude Code's token management:
Smarter Pipeline
- Microcompact only runs when history >15 messages (was every loop)
- Circuit breaker: 3 failures → stop retrying compaction
- Token estimation padded 33% for conservatism (was under-counting)
Selective Thinking
- Keeps last 2 turns' thinking blocks (was only latest)
- Preserves recent reasoning while reducing old bloat
Per-Model Budgets
- Default max_tokens: 8K → 16K
- Model-specific caps: Opus 32K, Sonnet 64K, Haiku 16K, etc.
Cheaper Compaction
- Tiers down further: haiku → Gemini Flash
- Free models as compaction target
New: /tokens command
Estimated: ~45,200 tokens (API-anchored)
Context: 200k window (22.6% used)
Messages: 47
Tool results: 12 (340KB)
Thinking: 3 blocks
✓ Healthy