Cache tool conversion and IR validation to eliminate repeated work

## Problem

Profiling shows that **88% of conversion time** is spent on two operations that produce identical results across consecutive requests in the same conversation:

| Hotspot | % of conversion time | What it does |
|---------|---------------------|--------------|
| **IR validation** (`_vendor/validate.py`) | **63%** | Recursively validates IRRequest TypedDict — every tool definition, every message, every content part |
| **Schema sanitization** (`base/schema.py`) | **25%** | Recursively strips unsupported JSON Schema keywords from every tool's parameter schema |
| Actual conversion logic | ~5% | Already fast |

This makes Rosetta **~6.3x slower than LiteLLM** on real-world payloads (64-msg: 2.4ms vs 0.4ms; 218-msg: 4.9ms vs 0.8ms).

### Why this is wasteful

In a multi-turn agent conversation (e.g. Claude Code with 41 tools, 218 messages):

- **Tool definitions are identical across all turns.** The same 31-41 tools get validated + sanitized on every single request.
- **Messages are append-only.** Turn N's messages are a strict prefix of turn N+1's. Previously-validated messages get re-validated every turn.

## Sub-issues

- [x] #277 — Phase 1: LRU cache for tool definition conversion and sanitization
- [x] #278
- [x] #280 — Phase 2: Incremental message validation with per-message hash cache
- [ ] #282

## Approach: process-level LRU caching

### Phase 1: Tool list caching (highest impact, smallest change)

Cache at two levels using content-hash of the tools list:

1. **`_convert_tools_from_p()`** — cache `provider_tools → IR tools` conversion result
2. **`_apply_tool_config()` / `ir_tool_definition_to_p()`** — cache `IR tools → provider tools` conversion result (includes `sanitize_schema`)

This skips both validation and sanitization for tools on cache hit. Expected to eliminate 60-70% of total conversion cost for tool-heavy payloads.

### Phase 2: Incremental message validation (second priority)

Per-message hash-based validation cache:

- Hash each message dict individually
- On `validate_ir_request`, only validate messages not seen in the LRU cache
- Growing conversations only pay validation cost for new messages

### Implementation notes

- Use process-level `functools.lru_cache` or a simple `dict` with bounded size
- Hash strategy: `hash(json.dumps(obj, sort_keys=True))` or structural fingerprint — need to benchmark hash cost vs validation cost
- No persistence needed initially — process restart clears cache, which is fine
- Thread safety: converters are used from async handlers but in a single-threaded event loop, so no locking needed
- Cache size: bounded LRU (e.g. 256 entries for tools, 4096 for messages) to prevent unbounded growth

### Measurement

Profile data from `benchmarks/bench_real_payload.py` and `benchmarks/bench_real_litellm_comparison.py`. Key command:

```bash
conda activate llm-rosetta && cd benchmarks && python bench_real_litellm_comparison.py
```

Target: bring the 6.3x gap vs LiteLLM down to <2x on real payloads.

## Non-goals (for now)

- Persistent (disk-backed) cache — only if process-level cache proves effective
- Cross-process shared cache — not needed for single-process gateway
- Disabling validation entirely — cache approach preserves correctness guarantees




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache tool conversion and IR validation to eliminate repeated work #276

Problem

Why this is wasteful

Sub-issues

Approach: process-level LRU caching

Phase 1: Tool list caching (highest impact, smallest change)

Phase 2: Incremental message validation (second priority)

Implementation notes

Measurement

Non-goals (for now)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Hotspot	% of conversion time	What it does
IR validation (`_vendor/validate.py`)	63%	Recursively validates IRRequest TypedDict — every tool definition, every message, every content part
Schema sanitization (`base/schema.py`)	25%	Recursively strips unsupported JSON Schema keywords from every tool's parameter schema
Actual conversion logic	~5%	Already fast

Cache tool conversion and IR validation to eliminate repeated work #276

Description

Problem

Why this is wasteful

Sub-issues

Approach: process-level LRU caching

Phase 1: Tool list caching (highest impact, smallest change)

Phase 2: Incremental message validation (second priority)

Implementation notes

Measurement

Non-goals (for now)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions