Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 176 additions & 0 deletions docs/libraries/nemo-evaluator/interceptors/caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,179 @@ Each cache uses a SHA256 hash of the request data as the lookup key. When a cach
2. **Response received** from model API
3. **Store response** in cache with generated key
4. **Continue processing** with response interceptors

## Cache Export and Import

The caching interceptor supports exporting the entire cache to a single portable binary file (`.cache` format) and importing it later. This is useful for:

- **Sharing caches** between team members or machines
- **Version control** of evaluation caches
- **CI/CD pipelines** where you want deterministic evaluation results
- **Offline evaluation** using pre-generated responses

### Exporting Cache

To export the cache after evaluation, enable the `export_cache` parameter:

```yaml
target:
api_endpoint:
adapter_config:
interceptors:
- name: "caching"
enabled: true
config:
cache_dir: "./evaluation_cache"
save_requests: true
save_responses: true
export_cache: true # Export to cache_export.cache
```

This creates a `cache_export.cache` file in the output directory containing all cached requests, responses, and headers in a single pickled file.

### Importing Cache

To prefill the cache from a previously exported file:

```yaml
target:
api_endpoint:
adapter_config:
interceptors:
- name: "caching"
enabled: true
config:
cache_dir: "./local_cache"
prefill_from_export: "/path/to/cache_export.cache"
reuse_cached_responses: true
```

The cache will be loaded at evaluation start, and all matching requests will use the cached responses.

### Cache Format Comparison

**Local Cache** (`cache_dir`):
- Three separate directories: `requests/`, `responses/`, `headers/`
- Uses disk-backed SQLite databases
- Efficient for ongoing evaluations
- Not easily portable

**Export Cache** (`export_cache`/`prefill_from_export`):
- Single binary `.cache` file (pickle format)
- Portable and shareable
- Version control friendly
- Can be used across machines

## Test Mode

Test mode is a debugging feature that helps identify when requests have changed between evaluation runs. When enabled with cached responses, it will fail on the first cache miss and show a diff with the most similar cached request.

### Enabling Test Mode

```yaml
target:
api_endpoint:
adapter_config:
interceptors:
- name: "caching"
enabled: true
config:
cache_dir: "./evaluation_cache"
prefill_from_export: "/path/to/baseline_cache.cache"
reuse_cached_responses: true
test_mode: true # Fail on cache miss
```

### Use Cases

**1. Regression Testing**
```bash
# First run: Create baseline cache
nemo-evaluator run --config eval.yaml --overrides 'target.api_endpoint.adapter_config.interceptors[0].config.export_cache=true'

# Later: Test with baseline cache
nemo-evaluator run --config eval.yaml --overrides 'target.api_endpoint.adapter_config.interceptors[0].config.prefill_from_export=cache_export.cache,target.api_endpoint.adapter_config.interceptors[0].config.test_mode=true'
```

**2. Debugging Request Changes**

When test mode detects a cache miss, it will:
1. Find the most similar cached request using fuzzy string matching (rapidfuzz)
2. Calculate similarity score (0-100%)
3. Generate a unified diff showing the differences
4. Raise `CacheMissInTestModeError` with detailed information

Example error output:
```
CacheMissInTestModeError: Cache miss in test mode for request with cache_key=abc123def456...

Most similar cached request (similarity: 94.56%):
--- cached_request
+++ current_request
@@ -1,7 +1,7 @@
{
"messages": [
{
"content": "What is 2+2?",
"role": "user"
}
],
- "temperature": 0.0
+ "temperature": 0.7
}
```

This helps you quickly identify what changed in your request format or parameters.

### Exception Handling

```python
from nemo_evaluator.adapters.interceptors.caching_interceptor import CacheMissInTestModeError

try:
# Run evaluation with test_mode=True
evaluator.evaluate(dataset)
except CacheMissInTestModeError as e:
print(f"Cache miss for request: {e.request_data}")
if e.most_similar_request:
print(f"Most similar (score: {e.similarity_score:.2f}%)")
print(f"Diff:\n{e.diff}")
else:
print("No similar cached requests found")
```

## Best Practices

### Development Workflow

1. **Initial Run**: Create a baseline cache
```yaml
save_requests: true
save_responses: true
export_cache: true
```

2. **Subsequent Runs**: Use cached responses
```yaml
prefill_from_export: "baseline_cache.cache"
reuse_cached_responses: true
```

3. **Testing Changes**: Enable test mode
```yaml
prefill_from_export: "baseline_cache.cache"
reuse_cached_responses: true
test_mode: true
```

### Performance Tips

- Use `max_saved_requests` and `max_saved_responses` to limit cache size for large evaluations
- Export cache only when needed (post-processing can be slow for large caches)
- Use local `cache_dir` for ongoing work, export cache for sharing

### Security Considerations

- Cache files contain API request/response data - handle with appropriate security
- Binary pickle format: only load cache files from trusted sources
- Consider encrypting cache exports if they contain sensitive data
10 changes: 9 additions & 1 deletion docs/libraries/nemo-evaluator/interceptors/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,11 +127,18 @@ Collects statistics from API responses for metrics collection and analysis.

::::

## Process Post-Evaluation Results
## Evaluation Lifecycle Hooks

::::{grid} 1 2 2 2
:gutter: 1 1 1 2

:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Pre-Evaluation Hooks
:link: pre-evaluation-hooks
:link-type: doc

Prepare the environment before evaluations start - download datasets, install packages, prefill caches.
:::

:::{grid-item-card} {octicon}`report;1.5em;sd-mr-1` Post-Evaluation Hooks
:link: post-evaluation-hooks
:link-type: doc
Expand All @@ -155,5 +162,6 @@ Progress Tracking <progress-tracking>
Raising on Client Errors <raise-client-error>
Reasoning <reasoning>
Response Statistics <response-stats>
Pre-Evaluation Hooks <pre-evaluation-hooks>
Post-Evaluation Hooks <post-evaluation-hooks>
:::
Loading