Benchmark comparing Clio (v0.0.1) against Claude Code (v2.1.86) on identical tasks through the same API proxy, measuring latency, token usage, cache efficiency, and output correctness.
| Item | Value |
|---|---|
| Date | 2026-03-28 |
| Host | macOS (Apple Silicon, 32GB) — Device A |
| Node.js | v22.22.0 |
| API Proxy | http://192.168.10.6:8080 (same for both Clio and CC) |
| Upstream Model | claude-sonnet-4-20250514 |
| Runs per task | 3 |
Both tools use the same proxy, same API key, same model. The only variable is the CLI itself.
| ID | Description | Tools Expected |
|---|---|---|
simple-qa |
"What is the capital of France?" | None |
math |
"What is 17 * 23 + 42?" | None |
read-file |
Read package.json, extract project name | Read |
search-code |
Find all TS files containing 'export' | Grep/Glob |
multi-step |
Read tsconfig + package.json, describe TS config | Read x2 + analyze |
Clio: -p --output-format json --api-format openai
CC: -p --output-format json (Anthropic format, native)
| Task | Clio | CC | Delta | Winner |
|---|---|---|---|---|
| simple-qa | 1,292 | 1,826 | -534 | Clio |
| math | 2,782 | 2,312 | +470 | CC |
| read-file | 10,193 | 8,909 | +1,284 | CC |
| search-code | 22,022 | 18,110 | +3,912 | CC |
| multi-step | 11,502 | 11,165 | +337 | ~Tie |
| Task | Clio | CC | Clio saves |
|---|---|---|---|
| simple-qa | 2,487 | 22,225 | -89% |
| math | 2,490 | 22,228 | -89% |
| read-file | 11,463 | 67,190 | -83% |
| search-code | 14,854 | 44,857 | -67% |
| multi-step | 16,038 | 67,654 | -76% |
| Task | Clio hit% | CC hit% |
|---|---|---|
| All | 0% | 100% |
Clio 0% cache hit — OpenAI format does not carry cache_control fields, so the proxy cannot cache prompt segments.
Clio in OpenAI format loses all caching capability. This is a format limitation, not a code bug.
Clio: -p --output-format json (Anthropic format)
CC: -p --output-format json
Both now use Anthropic format through the same proxy.
| Task | Clio | CC | Delta |
|---|---|---|---|
| simple-qa | 2,853 | 1,595 | +1,258 |
| math | 2,457 | 2,376 | +81 |
| read-file | 2,751 | 7,310 | -4,559 |
| search-code | 13,766 | 12,846 | +920 |
| multi-step | 12,082 | 10,185 | +1,897 |
Clio reported input_tokens: 0 and cache: 0% for all tasks. Investigation revealed the proxy returns usage data in the message_delta SSE event, not message_start. Clio only extracted input_tokens and cache fields from message_start.
Root cause: Proxy puts input_tokens + cache_read_input_tokens in message_delta.usage, but Clio only read output_tokens from that event.
Fix: Added input_tokens, cache_creation_input_tokens, cache_read_input_tokens extraction to the message_delta handler in agent.ts.
After fixing usage parsing. Both tools on equal footing: same proxy, same format, same model.
| Task | Clio | CC | Delta | Winner |
|---|---|---|---|---|
| simple-qa | 1,261 | 1,527 | -266 | Clio |
| math | 1,812 | 2,581 | -769 | Clio |
| read-file | 8,343 | 6,793 | +1,550 | CC |
| search-code | 14,439 | 12,436 | +2,003 | CC |
| multi-step | 10,894 | 10,227 | +667 | ~Tie |
No-tool tasks: Clio is faster (smaller prompt = less processing). Tool tasks: CC is faster (fewer round-trips).
| Task | Clio | CC | Clio saves |
|---|---|---|---|
| simple-qa | 2,499 | 22,230 | -89% |
| math | 4,301 | 22,233 | -81% |
| read-file | 14,416 | 67,205 | -79% |
| search-code | 15,232 | 44,863 | -66% |
| multi-step | 14,884 | 67,671 | -78% |
| Task | Clio cache_read | CC cache_read | Clio hit% | CC hit% |
|---|---|---|---|---|
| simple-qa | 0 | 57,984 | 0%* | 100% |
| math | 2,304 | 53,760 | 100% | 100% |
| read-file | 17,536 | 115,840 | 100% | 100% |
| search-code | 29,312 | 35,840 | 100% | 100% |
| multi-step | 32,384 | 111,616 | 100% | 100% |
* simple-qa first run is always a cache miss (cold start). Subsequent runs would hit cache.
Both tools now achieve 100% cache hit rate through the same proxy. Clio's cache_read is 40-83% smaller because its cached prompt is smaller.
| Task | Clio | CC |
|---|---|---|
| simple-qa | 1.0 | 1.0 |
| math | 1.7 | 1.0 |
| read-file | 5.0 | 3.0 |
| search-code | 5.3 | 2.0 |
| multi-step | 5.0 | 5.0 |
CC uses fewer turns on tool tasks. Its larger system prompt provides more detailed tool-use guidance, leading to more efficient tool selection.
All tasks returned correct results from both tools across all 3 rounds:
| Task | Clio | CC | Match |
|---|---|---|---|
| simple-qa | Paris | Paris | Yes |
| math | 433 | 433 | Yes |
| read-file | clio-cli | clio-cli | Yes |
| search-code | All files | All files | Yes |
| multi-step | ES2022 + Node16 | ES2022 + Node16 | Yes |
- 70-89% fewer input tokens — smaller system prompt (~2.5k vs ~22k tokens)
- 40-83% less cache volume — less data to cache and read back
- Faster on simple tasks — less prompt to process means lower latency
- Lower cost potential — fewer tokens at full price + fewer tokens at cache price
- Fewer tool turns — detailed system prompt guides more efficient tool selection
- Faster on tool-heavy tasks — fewer API round-trips compensates for larger prompt
- Mature caching — battle-tested cache_control placement
| Metric | Winner | Margin |
|---|---|---|
| Token efficiency | Clio | 70-89% fewer input tokens |
| Cache efficiency | Clio | 40-83% less cache volume |
| Simple task latency | Clio | 250-770ms faster |
| Tool task latency | CC | 650-2000ms faster |
| Tool turn efficiency | CC | 2-3x fewer turns on search/read |
| Correctness | Tie | 100% on all tasks |
- Print mode stdout leak — JSON output mode leaked streaming text to stdout. Fixed by suppressing
process.stdout.writeduring agent loop in JSON mode. - Usage parsing from
message_delta— Proxy returnsinput_tokensandcache_read_input_tokensinmessage_delta, notmessage_start. Added extraction from both events.
git clone https://github.com/icebear0828/c2a.git
cd c2a && npm install && npx tsc
npm install -g @anthropic-ai/claude-code
export ANTHROPIC_BASE_URL=http://your-proxy:8080
export ANTHROPIC_API_KEY=your-key
# Anthropic format (recommended)
node benchmark/run.mjs --runs 3
# OpenAI format
node benchmark/run.mjs --runs 3 --clio-format openai
# Subset
node benchmark/run.mjs --runs 1 --tasks simple-qa,math