v2.3.0 — Token Management Overhaul

1bcMax released this 04 Apr 13:53

· 83 commits to main since this release

f368794

Token Reduction Improvements

Based on deep comparison with Claude Code's token management:

Smarter Pipeline

Microcompact only runs when history >15 messages (was every loop)
Circuit breaker: 3 failures → stop retrying compaction
Token estimation padded 33% for conservatism (was under-counting)

Selective Thinking

Keeps last 2 turns' thinking blocks (was only latest)
Preserves recent reasoning while reducing old bloat

Per-Model Budgets

Default max_tokens: 8K → 16K
Model-specific caps: Opus 32K, Sonnet 64K, Haiku 16K, etc.

Cheaper Compaction

Tiers down further: haiku → Gemini Flash
Free models as compaction target

New: /tokens command

Estimated:  ~45,200 tokens (API-anchored)
Context:    200k window (22.6% used)
Messages:   47
Tool results: 12 (340KB)
Thinking:   3 blocks
✓ Healthy

Assets 2