Mimicode automatically routes requests to Haiku 3.5 or Sonnet 4.5 based on task complexity. This reduces costs by ~60-80% while maintaining quality for most operations.
Cost difference (May 2026):
- Haiku 3.5: $0.80/MTok input, $4/MTok output (5x cheaper)
- Sonnet 4.5: $3/MTok input, $15/MTok output (smarter, slower)
Key insight: Most coding tasks are execution, not planning. With strong prompt guidance, Haiku can handle:
- File reading and searching
- Simple single-file edits
- Running tests/commands
- Straightforward refactors
Sonnet is reserved for:
- Initial planning (first turn)
- Multi-file changes
- Complex debugging
- Ambiguous requirements
Read-only operations:
"Where is the bash function?"
"Show me all imports in agent.py"
"Find references to 'route_model'"
Simple edits (single file):
"Change VERSION to 0.2 in config.py"
"Rename foo to bar in helpers.py only"
"Fix typo in README.md"
Bash commands:
"Run pytest and show results"
"Install requirements"
"Build the project"
First turn (planning):
- Always uses Sonnet to understand the task
Multi-file changes:
"Rename variable foo to bar everywhere"
"Update imports across the entire codebase"
"Refactor architecture for X"
Debugging:
- After tool errors → switches to Sonnet
- Complex error traces
- Performance issues
Complex planning:
"What's the best approach to add feature X?"
"Design a new module for Y"
"Explain the architecture"
When routing to Haiku, mimicode adds task-specific guidance to the system prompt:
For edits:
Read the file first. Use exact old_text matching with 2-3 lines context. For multiple edits in one file, use batched edits=[...]. Double-check uniqueness.
For searches:
Use
rgfor searches. Be precise and quote file:line.
For bash:
Run commands directly. Show output. Be concise.
This scaffolding lets Haiku perform at near-Sonnet quality for scoped tasks.
Use the /route slash command:
/route
Output:
Total routing decisions: 12
Haiku usage: 66.7%
By model:
haiku-3.5 8 ( 66.7%)
sonnet-4.5 4 ( 33.3%)
By reason:
first_turn 2 ( 16.7%)
simple_edit 5 ( 41.7%)
simple_read 3 ( 25.0%)
fallback 2 ( 16.7%)
Check sessions/<id>.jsonl for model_route events:
{
"kind": "model_route",
"data": {
"step": 2,
"model": "claude-3-5-haiku-20250312",
"reason": "simple_edit",
"has_guidance": true
}
}Typical session without routing (all Sonnet):
- 10 turns, 50K tokens input, 10K tokens output
- Cost: (50K × $3/M) + (10K × $15/M) = $0.30
Same session with routing (70% Haiku):
- 7 turns Haiku: (35K × $0.80/M) + (7K × $4/M) = $0.056
- 3 turns Sonnet: (15K × $3/M) + (3K × $15/M) = $0.090
- Total: $0.146 (51% savings)
Real savings depend on your workflow. Heavy refactors → more Sonnet. Quick edits/searches → more Haiku.
This isn't about replacing engineers with cheaper models. It's about matching tool to task:
- Haiku for well-defined execution (reading, simple edits, tests)
- Sonnet for ambiguous planning, reasoning, complex changes
You stay in control. The router just picks the right wrench for each bolt.
Edit router.py to customize routing rules:
# Make routing more aggressive (use Haiku more)
if "refactor" in content_lower and "simple" in content_lower:
return HAIKU_3_5, "simple_edit", guidance
# Make routing more conservative (use Sonnet more)
if "production" in content_lower or "critical" in content_lower:
return SONNET_4_5, "safety", ""Test changes with:
pytest tests/test_router.py -vTo force Sonnet for everything (no routing), set an environment variable:
export MIMICODE_FORCE_SONNET=1
python agent.py --tui(Not implemented yet — coming soon if needed)