compress-on-input

Compress bloated tool results before they eat Claude's context window. Zero runtime dependencies.

Quick Start

Claude Code

# Install globally
npm install -g compress-on-input

# Add hook to Claude Code (one-time setup)
compress-on-input install

# Verify everything works
compress-on-input check

# Restart Claude Code — done!

opencode

# Install globally
npm install -g compress-on-input

# Add to opencode config (~/.config/opencode/opencode.json):
{
  "mcp": {
    "compress-on-input": {
      "type": "local",
      "command": ["compress-on-input-mcp"]
    }
  }
}

# Restart opencode

Available tools: ocr_image — extract text from images (PNG, JPG, PDF, WebP)

That's it. Every tool result is now automatically compressed before entering Claude's context.

The Problem

MCP tools return massive payloads that burn through Claude's context window:

Source	Typical Size	With compress-on-input	Measured Reduction
Screenshot (base64)	~210k chars	~425 chars (OCR text)	99.8%
DOM snapshot	1-17k chars	0.1-8k chars	50%
API/DB response	10-16k chars	80-150 chars	98.6%
Large text/docs	530k chars	265k chars	50%

Claude's attention is O(n²). More tokens = slower responses, higher cost, earlier compaction. compress-on-input fixes this at the source.

Real-World Results

Measured over 11 days of daily use (3,257 tool calls, March 2026):

Total input processed:  42.3M chars (~10.6M tokens)
After compression:       2.8M chars (~696k tokens)
Overall reduction:       93.4%
Estimated savings:       ~$148 (Opus pricing) / ~$30 (Sonnet)
Errors:                  0

Content Type	Events	Reduction	Avg Latency
Screenshots → OCR	181	99.8%	735ms
DOM snapshots	1,761	50%	20ms
JSON/DB results	60	98.6%	1ms
Large text	1	50%	58ms

0 errors, 0 data loss, median latency 15ms. Compression never makes things worse — fail-safe returns original on any issue.

Full performance report with charts and examples →

How It Works

compress-on-input installs as a PostToolUse hook in Claude Code. After every tool call, it intercepts the result and compresses it before Claude sees it.

Tool executes → Hook fires → compress-on-input compresses → Claude receives compressed result

Works with all tools — MCP servers (Playwright, databases, APIs) and built-in tools (Read, Bash, Grep). No need to configure each server individually.

Compression Strategies

Each result is automatically routed to the best compressor:

Content Type	Strategy	What it does	Measured
Screenshots	OCR	Apple Vision / Tesseract extracts text from image	99.8%
DOM snapshots	DOM→Markdown	Converts accessibility tree to clean Markdown	40-55%
Large JSON	Collapse	Schema-aware array/object summarization	98-99%
Large text	Smart truncate	BM25-ranked middle + optional Gemini	50%+
Small content	Passthrough	Below threshold — untouched	0%

Smart Text Compression

For large text (>5k tokens), instead of dumb truncation:

≤5k tokens     → passthrough (no compression needed)
5k–23k tokens  → head(2k) + full middle + tail(1k)
23k–53k tokens → head(2k) + BM25-ranked middle(→20k) + tail(1k)
>53k tokens    → head(2k) + BM25(→50k) + Gemini(→20k) + tail(1k)

How BM25 ranking works: The middle section is split into ~512-token chunks. A synthetic relevance query is built from tool metadata:

read_file({path: "src/auth/login.ts"})  →  query: "auth login"
browser_navigate({url: "react.dev/..."}) → query: "react useState reference"
browser_snapshot() after navigate(url)   → inherits URL intent from session

Chunks are ranked by BM25 similarity to this query. Top chunks (by relevance, in original order) fill the token budget. This keeps the most relevant content, not just the beginning.

Gemini 2.5 Flash-Lite (optional, ~$0.01/call) compresses further for very large texts. Falls back to BM25-only if no API key or if Gemini is unavailable.

Installation

Option A: Hook mode (recommended)

Works with all tools automatically:

npm install -g compress-on-input
compress-on-input install
compress-on-input check          # verify everything works
# Restart Claude Code (exit + claude)

This adds to ~/.claude/settings.json:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": ".*",
      "hooks": [{
        "type": "command",
        "command": "compress-on-input --hook --verbose",
        "timeout": 15
      }]
    }]
  }
}

Matcher examples

The matcher field is a regex that controls which tools trigger compression:

Matcher	What gets compressed
`.*`	All tools (recommended — built-in tools are auto-skipped)
`mcp__.*`	Only MCP tools (Playwright, databases, APIs)
`mcp__playwright__.*`	Only Playwright MCP tools
`mcp__playwright__\|mcp__webflow__`	Specific MCP servers

Note: Built-in tools (Read, Bash, Grep) are always skipped internally — they don't support output replacement. Using .* is safe and recommended.

Option B: Proxy mode (wrap specific MCP server)

# In ~/.claude.json, change MCP server command:
compress-on-input --wrap "npx @playwright/mcp@latest --cdp-endpoint http://localhost:9222" --verbose

Useful for testing or when you only want compression for specific servers.

From source

git clone https://github.com/Chill-AI-Space/compress-on-input.git
cd compress-on-input
npm install --include=dev
npm run build
# Then: node dist/index.js install

Uninstall

compress-on-input uninstall
# Restart Claude Code

Configuration

Config file

~/.config/compress-on-input/config.json:

{
  "threshold": 500,
  "maxTextTokens": 2000,
  "activationBytes": 400000,
  "ocrEngine": "auto",
  "verbose": true,
  "dryRun": false,
  "geminiApiKey": "your-gemini-api-key"
}

Option	Default	Description
`threshold`	500	Min tokens to trigger compression
`maxTextTokens`	2000	Target token budget per text block
`activationBytes`	400000	Min transcript size to activate (hook mode)
`ocrEngine`	`"auto"`	`auto` / `vision` (macOS) / `tesseract`
`verbose`	`false`	Log compression stats to stderr
`dryRun`	`false`	Log without modifying results
`geminiApiKey`	—	Gemini API key for smart compression of huge texts
`rules`	(see below)	Per-tool compression rules

Environment variables

export GEMINI_API_KEY="your-key"  # Alternative to config file

CLI flags

--hook                   Run as PostToolUse hook (used by install)
--wrap "cmd args"        Wrap an MCP server (proxy mode)
--config <path>          Custom config file path
--verbose                Log compression stats to stderr
--dry-run                Log what would be compressed, don't modify
--ocr-engine <engine>    auto | vision | tesseract
--max-text-tokens <n>    Token budget for text blocks (default: 2000)
--threshold <n>          Min tokens to trigger compression (default: 500)
--gemini-api-key <key>   Gemini API key for smart compression

Priority: CLI flags > env vars > config file > defaults

Per-tool rules

Override compression strategy for specific tools:

{
  "rules": [
    { "toolName": "browser_snapshot", "strategy": "dom-cleanup" },
    { "toolName": "my_screenshot_tool", "strategy": "ocr" },
    { "toolNamePattern": "db_.*", "strategy": "json-collapse", "maxTokens": 5000 },
    { "toolNamePattern": ".*", "strategy": "auto" }
  ]
}

Rules match tool names in order. First match wins. Default is auto (content-aware routing).

Compressors in Detail

OCR

macOS: Compiles and caches a Swift binary using Apple Vision framework at ~/.cache/compress-on-input/vision-ocr-{hash}
Other OS: Falls back to Tesseract (tesseract must be in PATH)
Quality check: if OCR returns <7 non-whitespace chars → keeps original image
Safety: Only OCRs images when a file path exists in sibling text blocks (original is on disk). Generated images (base64-only) pass through untouched.

DOM Cleanup

Strips [ref=e2] markers from inline text
Builds a compact ref mapping table at the bottom (Claude can still click elements)
Removes role="generic" and role="none" noise
Collapses empty generic nodes
Deduplicates repeated navigation blocks
Collapses multiple blank lines

JSON Collapse

Arrays >10 items → first 3 items + schema summary
Detects homogeneous arrays (same keys) and shows shape: {id, name, email}
Nesting beyond depth 5 → collapsed to key summary
Strips null values, empty strings, empty arrays
Falls through to truncate if content isn't valid JSON

Smart Truncate

Splits text into head (2k tokens), middle, tail (1k tokens)
Chunks middle into ~512-token segments with 50-token overlap
Structure-aware splitting: paragraph → line → sentence → word → hard split
Ranks chunks with BM25 using synthetic query from tool context
Selects top chunks (preserving original order) to fit 20k token budget
Optional Gemini 2.5 Flash-Lite pass for middle sections >20k tokens
Graceful fallback chain: Gemini fails → BM25-only, no query → preserve order

Architecture

src/
├── index.ts              CLI entry point, arg parsing
├── proxy.ts              JSON-RPC stdio proxy (proxy mode)
├── hook.ts               PostToolUse hook handler (hook mode)
├── pipeline.ts           Content-aware routing + compression orchestration
├── classifier.ts         Content type detection (image/DOM/JSON/text)
├── config.ts             Config loading, rule matching
├── chunker.ts            Structure-aware text chunking
├── bm25.ts               BM25 keyword ranking (~100 lines, zero deps)
├── query-builder.ts      Synthetic relevance query from tool metadata
├── session.ts            Tool call history for intent inheritance
├── logger.ts             Stderr logging
└── compressors/
    ├── ocr.ts            Apple Vision / Tesseract OCR
    ├── dom-cleanup.ts    Accessibility tree cleanup + ref mapping
    ├── json-collapse.ts  Schema-aware JSON summarization
    ├── truncate.ts       Smart head + BM25 middle + tail pipeline
    └── gemini.ts         Gemini Flash-Lite API (raw fetch, no SDK)

Design Decisions

Zero runtime dependencies. BM25, chunking, Gemini API — all built from scratch. Only TypeScript and Vitest as dev deps. Small, fast, auditable.

Hook-first architecture. PostToolUse hooks intercept ALL tool results universally. No need to wrap each MCP server individually. Proxy mode exists as an alternative.

Content-aware routing. Each content type gets a specialized compressor. DOM cleanup preserves clickability. JSON collapse preserves schema. OCR preserves semantic content.

Synthetic relevance queries. We infer intent from tool call metadata (URL navigated, file path read, grep pattern searched). Based on the HyDE principle — approximate queries work surprisingly well for ranking.

Session intent inheritance. browser_snapshot() after browser_navigate(url) inherits the URL's intent. Dramatically improves BM25 ranking for follow-up tool calls.

Fail-safe everywhere. Compressor fails → return original. Compression increases size >10% → return original. Gemini fails → BM25-only. No query signal → preserve original order. The tool never makes things worse.

Context-aware activation (hook mode). Checks transcript file size before compressing. Early in a session (context mostly empty), compression is skipped — full context is valuable. As context fills up, compression activates. Exception: screenshots are always compressed (250k tokens each).

Testing

npm test           # 56 tests across 9 test files
npm run build      # compile TypeScript

Troubleshooting

First step for any issue:

compress-on-input check

Runs 18 self-diagnostic checks: hook installation, binary in PATH, OCR engine, log directories, compression tests, performance benchmarks. Shows exact fix instructions for every failure.

Hook not firing?

Run compress-on-input check — it checks settings.json automatically
Restart Claude Code after installing (/exit + claude)
Run compress-on-input --hook --verbose manually with test input

OCR not working?

macOS: Should work automatically (Apple Vision framework)
Linux: Install tesseract (apt install tesseract-ocr)
Check: compress-on-input --hook --verbose will log OCR errors to stderr

Want to disable for specific tools?

{
  "rules": [
    { "toolName": "my_special_tool", "strategy": "passthrough" },
    { "toolNamePattern": ".*", "strategy": "auto" }
  ]
}

Gemini compression not activating?

Only triggers for text >53k tokens (after head/tail split, middle >50k)
Needs GEMINI_API_KEY env var or geminiApiKey in config
Get a key at Google AI Studio (free tier available)

License

MIT

Links

GitHub
npm

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
src		src
test		test
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

compress-on-input

Quick Start

Claude Code

opencode

The Problem

Real-World Results

How It Works

Compression Strategies

Smart Text Compression

Installation

Option A: Hook mode (recommended)

Matcher examples

Option B: Proxy mode (wrap specific MCP server)

From source

Uninstall

Configuration

Config file

Environment variables

CLI flags

Per-tool rules

Compressors in Detail

OCR

DOM Cleanup

JSON Collapse

Smart Truncate

Architecture

Design Decisions

Testing

Troubleshooting

License

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages