NVIDIA-NeMo · Glorf · Nov 4, 2025 · Oct 24, 2025 · Nov 19, 2025
@@ -86,3 +86,179 @@ Each cache uses a SHA256 hash of the request data as the lookup key. When a cach
 2. **Response received** from model API
 3. **Store response** in cache with generated key
 4. **Continue processing** with response interceptors
+
+## Cache Export and Import
+
+The caching interceptor supports exporting the entire cache to a single portable binary file (`.cache` format) and importing it later. This is useful for:
+
+- **Sharing caches** between team members or machines
+- **Version control** of evaluation caches
+- **CI/CD pipelines** where you want deterministic evaluation results
+- **Offline evaluation** using pre-generated responses
+
+### Exporting Cache
+
+To export the cache after evaluation, enable the `export_cache` parameter:
+
+```yaml
+target:
+  api_endpoint:
+    adapter_config:
+      interceptors:
+        - name: "caching"
+          enabled: true
+          config:
+            cache_dir: "./evaluation_cache"
+            save_requests: true
+            save_responses: true
+            export_cache: true  # Export to cache_export.cache
+```
+
+This creates a `cache_export.cache` file in the output directory containing all cached requests, responses, and headers in a single pickled file.
+
+### Importing Cache
+
+To prefill the cache from a previously exported file:
+
+```yaml
+target:
+  api_endpoint:
+    adapter_config:
+      interceptors:
+        - name: "caching"
+          enabled: true
+          config:
+            cache_dir: "./local_cache"
+            prefill_from_export: "/path/to/cache_export.cache"
+            reuse_cached_responses: true
+```
+
+The cache will be loaded at evaluation start, and all matching requests will use the cached responses.
+
+### Cache Format Comparison
+
+**Local Cache** (`cache_dir`):
+- Three separate directories: `requests/`, `responses/`, `headers/`
+- Uses disk-backed SQLite databases
+- Efficient for ongoing evaluations
+- Not easily portable
+
+**Export Cache** (`export_cache`/`prefill_from_export`):
+- Single binary `.cache` file (pickle format)
+- Portable and shareable
+- Version control friendly
+- Can be used across machines
+
+## Test Mode
+
+Test mode is a debugging feature that helps identify when requests have changed between evaluation runs. When enabled with cached responses, it will fail on the first cache miss and show a diff with the most similar cached request.
+
+### Enabling Test Mode
+
+```yaml
+target:
+  api_endpoint:
+    adapter_config:
+      interceptors:
+        - name: "caching"
+          enabled: true
+          config:
+            cache_dir: "./evaluation_cache"
+            prefill_from_export: "/path/to/baseline_cache.cache"
+            reuse_cached_responses: true
+            test_mode: true  # Fail on cache miss
+```
+
+### Use Cases
+
+**1. Regression Testing**
+```bash
+# First run: Create baseline cache
+nemo-evaluator run --config eval.yaml --overrides 'target.api_endpoint.adapter_config.interceptors[0].config.export_cache=true'
+
+# Later: Test with baseline cache
+nemo-evaluator run --config eval.yaml --overrides 'target.api_endpoint.adapter_config.interceptors[0].config.prefill_from_export=cache_export.cache,target.api_endpoint.adapter_config.interceptors[0].config.test_mode=true'
+```
+
+**2. Debugging Request Changes**
+
+When test mode detects a cache miss, it will:
+1. Find the most similar cached request using fuzzy string matching (rapidfuzz)
+2. Calculate similarity score (0-100%)
+3. Generate a unified diff showing the differences
+4. Raise `CacheMissInTestModeError` with detailed information
+
+Example error output:
+```
+CacheMissInTestModeError: Cache miss in test mode for request with cache_key=abc123def456...
+
+Most similar cached request (similarity: 94.56%):
+--- cached_request
++++ current_request
+@@ -1,7 +1,7 @@
+ {
+   "messages": [
+     {
+       "content": "What is 2+2?",
+       "role": "user"
+     }
+   ],
+-  "temperature": 0.0
++  "temperature": 0.7
+ }
+```
+
+This helps you quickly identify what changed in your request format or parameters.
+
+### Exception Handling
+
+```python
+from nemo_evaluator.adapters.interceptors.caching_interceptor import CacheMissInTestModeError
+
+try:
+    # Run evaluation with test_mode=True
+    evaluator.evaluate(dataset)
+except CacheMissInTestModeError as e:
+    print(f"Cache miss for request: {e.request_data}")
+    if e.most_similar_request:
+        print(f"Most similar (score: {e.similarity_score:.2f}%)")
+        print(f"Diff:\n{e.diff}")
+    else:
+        print("No similar cached requests found")
+```
+
+## Best Practices
+
+### Development Workflow
+
+1. **Initial Run**: Create a baseline cache
+   ```yaml
+   save_requests: true
+   save_responses: true
+   export_cache: true
+   ```
+
+2. **Subsequent Runs**: Use cached responses
+   ```yaml
+   prefill_from_export: "baseline_cache.cache"
+   reuse_cached_responses: true
+   ```
+
+3. **Testing Changes**: Enable test mode
+   ```yaml
+   prefill_from_export: "baseline_cache.cache"
+   reuse_cached_responses: true
+   test_mode: true
+   ```
+
+### Performance Tips
+
+- Use `max_saved_requests` and `max_saved_responses` to limit cache size for large evaluations
+- Export cache only when needed (post-processing can be slow for large caches)
+- Use local `cache_dir` for ongoing work, export cache for sharing
+
+### Security Considerations
+
+- Cache files contain API request/response data - handle with appropriate security
+- Binary pickle format: only load cache files from trusted sources
+- Consider encrypting cache exports if they contain sensitive data
@@ -127,11 +127,18 @@ Collects statistics from API responses for metrics collection and analysis.
 
 ::::
 
-## Process Post-Evaluation Results
+## Evaluation Lifecycle Hooks
 
 ::::{grid} 1 2 2 2
 :gutter: 1 1 1 2
 
+:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Pre-Evaluation Hooks
+:link: pre-evaluation-hooks
+:link-type: doc
+
+Prepare the environment before evaluations start - download datasets, install packages, prefill caches.
+:::
+
 :::{grid-item-card} {octicon}`report;1.5em;sd-mr-1` Post-Evaluation Hooks
 :link: post-evaluation-hooks
 :link-type: doc
@@ -155,5 +162,6 @@ Progress Tracking <progress-tracking>
 Raising on Client Errors <raise-client-error>
 Reasoning <reasoning>
 Response Statistics <response-stats>
+Pre-Evaluation Hooks <pre-evaluation-hooks>
 Post-Evaluation Hooks <post-evaluation-hooks>
 :::