Skip to content

v2.3.0 — Token Management Overhaul

Choose a tag to compare

@1bcMax 1bcMax released this 04 Apr 13:53
· 83 commits to main since this release
f368794

Token Reduction Improvements

Based on deep comparison with Claude Code's token management:

Smarter Pipeline

  • Microcompact only runs when history >15 messages (was every loop)
  • Circuit breaker: 3 failures → stop retrying compaction
  • Token estimation padded 33% for conservatism (was under-counting)

Selective Thinking

  • Keeps last 2 turns' thinking blocks (was only latest)
  • Preserves recent reasoning while reducing old bloat

Per-Model Budgets

  • Default max_tokens: 8K → 16K
  • Model-specific caps: Opus 32K, Sonnet 64K, Haiku 16K, etc.

Cheaper Compaction

  • Tiers down further: haiku → Gemini Flash
  • Free models as compaction target

New: /tokens command

Estimated:  ~45,200 tokens (API-anchored)
Context:    200k window (22.6% used)
Messages:   47
Tool results: 12 (340KB)
Thinking:   3 blocks
✓ Healthy