diff --git a/README.md b/README.md
index 81aea73e..a48e0e61 100644
--- a/README.md
+++ b/README.md
@@ -150,6 +150,7 @@ OpenHarness is an open-source Python implementation designed for **researchers,
   <strong>Start here:</strong>
   <a href="#-quick-start">Quick Start</a> ·
   <a href="#-provider-compatibility">Provider Compatibility</a> ·
+  <a href="README_PROVIDERS.md">LLM Providers</a> ·
   <a href="docs/SHOWCASE.md">Showcase</a> ·
   <a href="CONTRIBUTING.md">Contributing</a> ·
   <a href="CHANGELOG.md">Changelog</a>
@@ -242,8 +243,55 @@ oh -p "Fix the bug" --output-format stream-json
 
 ## 🔌 Provider Compatibility
 
+OpenHarness supports a wide variety of LLM providers through its extensible provider registry system. The system automatically detects providers based on API keys, base URLs, and model names.
+
+**📖 [Complete Provider Guide](README_PROVIDERS.md)** | **🚀 [Interactive Demo](scripts/demo_providers.py)**
+
 OpenHarness supports three API formats: **Anthropic** (default), **OpenAI-compatible** (`--api-format openai`), and **GitHub Copilot** (`--api-format copilot`). The OpenAI format covers a wide range of providers.
 
+### Quick Provider Setup
+
+```bash
+# OpenRouter (100+ models)
+export OPENROUTER_API_KEY="sk-or-v1-..."
+oh --model anthropic/claude-3-haiku
+
+# DeepSeek
+export DEEPSEEK_API_KEY="your-key"
+oh --model deepseek-chat
+
+# Groq (fast inference)
+export GROQ_API_KEY="gsk_..."
+oh --model llama3-70b-8192
+
+# Ollama (local)
+oh --base-url http://localhost:11434/v1 --model llama2
+```
+
+### Adding New Providers
+
+To add support for a new LLM provider:
+
+1. Edit `src/openharness/api/registry.py`
+2. Add a `ProviderSpec` to the `PROVIDERS` tuple
+3. Test with `python scripts/demo_providers.py`
+
+Example:
+```python
+ProviderSpec(
+    name="myprovider",
+    keywords=("myprovider", "myai"),
+    env_key="MYPROVIDER_API_KEY",
+    display_name="MyProvider",
+    backend_type="openai_compat",
+    default_base_url="https://api.myprovider.com/v1",
+    detect_by_key_prefix="mp_",  # optional
+    detect_by_base_keyword="myprovider",  # optional
+),
+```
+
+### Anthropic Format (default)
+
 ### Anthropic Format (default)
 
 | Provider profile | Detection signal | Notes |
diff --git a/README_PROVIDERS.md b/README_PROVIDERS.md
new file mode 100644
index 00000000..e22b2f55
--- /dev/null
+++ b/README_PROVIDERS.md
@@ -0,0 +1,103 @@
+# Adding LLM Providers to OpenHarness
+
+OpenHarness supports a wide variety of LLM providers through its extensible provider registry system. This guide shows you how to add new providers and configure them.
+
+## Quick Start
+
+### For Users: Using Existing Providers
+
+OpenHarness already supports many popular providers. Here's how to use them:
+
+```bash
+# OpenRouter (access to 100+ models)
+export OPENROUTER_API_KEY="sk-or-v1-..."
+oh --model anthropic/claude-3-haiku "Hello world"
+
+# DeepSeek
+export DEEPSEEK_API_KEY="your-key"
+oh --model deepseek-chat "Code review this function"
+
+# Groq (fast inference)
+export GROQ_API_KEY="gsk_..."
+oh --model llama3-70b-8192 "Analyze this code"
+
+# Ollama (local models)
+oh --base-url http://localhost:11434/v1 --model llama2 "Local AI chat"
+```
+
+### For Developers: Adding New Providers
+
+To add a new LLM provider:
+
+1. **Edit the registry** (`src/openharness/api/registry.py`)
+2. **Add your provider spec** to the `PROVIDERS` tuple
+3. **Test the configuration**
+
+Example provider spec:
+```python
+ProviderSpec(
+    name="myprovider",
+    keywords=("myprovider", "myai"),
+    env_key="MYPROVIDER_API_KEY",
+    display_name="MyProvider",
+    backend_type="openai_compat",  # or "anthropic"
+    default_base_url="https://api.myprovider.com/v1",
+    detect_by_key_prefix="mp_",    # optional
+    detect_by_base_keyword="myprovider",  # optional
+    is_gateway=False,
+    is_local=False,
+    is_oauth=False,
+),
+```
+
+## Demo
+
+Run the interactive demo to see how providers work:
+
+```bash
+python scripts/demo_providers.py
+```
+
+This shows:
+- How provider detection works
+- How to add new providers
+- Configuration examples
+- CLI usage patterns
+
+## Supported Providers
+
+OpenHarness currently supports:
+
+- **Anthropic** (Claude models)
+- **OpenAI** (GPT models)
+- **OpenRouter** (100+ models via gateway)
+- **DeepSeek**
+- **Groq** (fast inference)
+- **GitHub Copilot** (OAuth)
+- **Ollama** (local models)
+- And many more...
+
+See `docs/LLM_PROVIDERS.md` for the complete list and detailed configuration instructions.
+
+## Key Concepts
+
+### Provider Detection Priority
+1. API key prefix (e.g., `sk-or-` → OpenRouter)
+2. Base URL keywords (e.g., `deepseek.com` → DeepSeek)
+3. Model name keywords (e.g., `claude` → Anthropic)
+
+### Backend Types
+- `anthropic`: Native Anthropic SDK (best for Claude)
+- `openai_compat`: OpenAI SDK (works with most providers)
+- `copilot`: GitHub Copilot OAuth
+
+### Configuration Methods
+- Environment variables: `export PROVIDER_API_KEY="..."`
+- Command line: `oh --model model-name --base-url https://...`
+- Settings file: `~/.openharness/settings.json`
+
+## Need Help?
+
+- Check the [full documentation](docs/LLM_PROVIDERS.md)
+- Run the demo: `python scripts/demo_providers.py`
+- Test detection: `oh --model your-model-name --dry-run`
\ No newline at end of file
diff --git a/docs/LLM_PROVIDERS.md b/docs/LLM_PROVIDERS.md
new file mode 100644
index 00000000..e9eb0375
--- /dev/null
+++ b/docs/LLM_PROVIDERS.md
@@ -0,0 +1,274 @@
+# Adding LLM Providers to OpenHarness
+
+OpenHarness supports a wide variety of LLM providers through its extensible provider registry system. This guide shows how to add new providers and configure them for use.
+
+## How Provider Detection Works
+
+OpenHarness automatically detects providers using a priority system:
+
+1. **API Key Prefix**: Special key prefixes (e.g., `sk-or-` for OpenRouter)
+2. **Base URL Keywords**: Substrings in the API base URL
+3. **Model Name Keywords**: Keywords in model names
+
+## Adding a New Provider
+
+### Step 1: Add to the Provider Registry
+
+Edit `src/openharness/api/registry.py` and add your provider to the `PROVIDERS` tuple:
+
+```python
+ProviderSpec(
+    name="your_provider",
+    keywords=("keyword1", "keyword2"),  # Model name keywords
+    env_key="YOUR_PROVIDER_API_KEY",    # Environment variable name
+    display_name="Your Provider",        # Human-readable name
+    backend_type="openai_compat",       # "anthropic" | "openai_compat" | "copilot"
+    default_base_url="https://api.yourprovider.com/v1",
+    detect_by_key_prefix="",            # API key prefix (optional)
+    detect_by_base_keyword="yourprovider",  # Base URL keyword (optional)
+    is_gateway=False,                   # True if routes to multiple models
+    is_local=False,                     # True for local deployments
+    is_oauth=False,                     # True for OAuth providers
+),
+```
+
+### Step 2: Configuration
+
+Users can configure the provider in several ways:
+
+#### Environment Variables
+```bash
+export YOUR_PROVIDER_API_KEY="your-api-key-here"
+```
+
+#### Command Line
+```bash
+# Auto-detection by model name
+oh --model your-model-name
+
+# Explicit base URL
+oh --base-url https://api.yourprovider.com/v1
+
+# Explicit API format (if needed)
+oh --api-format openai
+```
+
+#### Settings File
+```json
+{
+  "api_key": "your-api-key-here",
+  "base_url": "https://api.yourprovider.com/v1",
+  "model": "your-model-name"
+}
+```
+
+## Examples
+
+### OpenRouter
+
+OpenRouter is already configured in the registry:
+
+```python
+ProviderSpec(
+    name="openrouter",
+    keywords=("openrouter",),
+    env_key="OPENROUTER_API_KEY",
+    display_name="OpenRouter",
+    backend_type="openai_compat",
+    default_base_url="https://openrouter.ai/api/v1",
+    detect_by_key_prefix="sk-or-",
+    detect_by_base_keyword="openrouter",
+    is_gateway=True,
+    is_local=False,
+    is_oauth=False,
+),
+```
+
+Usage:
+```bash
+export OPENROUTER_API_KEY="sk-or-..."
+oh --model openai/gpt-4o-mini
+```
+
+### Adding a Custom Provider
+
+Let's add support for a hypothetical provider called "ExampleAI":
+
+1. **Add to registry**:
+```python
+ProviderSpec(
+    name="exampleai",
+    keywords=("example", "exampleai"),
+    env_key="EXAMPLEAI_API_KEY",
+    display_name="ExampleAI",
+    backend_type="openai_compat",
+    default_base_url="https://api.exampleai.com/v1",
+    detect_by_key_prefix="exa_",
+    detect_by_base_keyword="exampleai",
+    is_gateway=False,
+    is_local=False,
+    is_oauth=False,
+),
+```
+
+2. **Usage**:
+```bash
+export EXAMPLEAI_API_KEY="exa_your_key_here"
+oh --model example/gpt-4
+
+# Or with explicit base URL
+oh --base-url https://api.exampleai.com/v1 --model gpt-4
+```
+
+### Popular Providers
+
+Here are some popular providers and their configurations:
+
+#### Anthropic (Native)
+- **Backend**: `anthropic`
+- **Models**: `claude-3-5-sonnet-20241022`, `claude-3-haiku-20240307`
+- **Key**: `ANTHROPIC_API_KEY`
+
+#### OpenAI
+- **Backend**: `openai_compat`
+- **Models**: `gpt-4o`, `gpt-4-turbo`
+- **Key**: `OPENAI_API_KEY`
+- **Base URL**: `https://api.openai.com/v1`
+
+#### DeepSeek
+```python
+ProviderSpec(
+    name="deepseek",
+    keywords=("deepseek",),
+    env_key="DEEPSEEK_API_KEY",
+    display_name="DeepSeek",
+    backend_type="openai_compat",
+    default_base_url="https://api.deepseek.com/v1",
+    detect_by_key_prefix="",
+    detect_by_base_keyword="deepseek",
+    is_gateway=False,
+    is_local=False,
+    is_oauth=False,
+),
+```
+
+Usage:
+```bash
+export DEEPSEEK_API_KEY="your-key"
+oh --model deepseek-chat
+```
+
+#### Groq
+```python
+ProviderSpec(
+    name="groq",
+    keywords=("groq",),
+    env_key="GROQ_API_KEY",
+    display_name="Groq",
+    backend_type="openai_compat",
+    default_base_url="https://api.groq.com/openai/v1",
+    detect_by_key_prefix="gsk_",
+    detect_by_base_keyword="groq",
+    is_gateway=False,
+    is_local=False,
+    is_oauth=False,
+),
+```
+
+Usage:
+```bash
+export GROQ_API_KEY="gsk_..."
+oh --model llama3-70b-8192
+```
+
+#### Ollama (Local)
+```python
+ProviderSpec(
+    name="ollama",
+    keywords=("ollama",),
+    env_key="",
+    display_name="Ollama",
+    backend_type="openai_compat",
+    default_base_url="http://localhost:11434/v1",
+    detect_by_key_prefix="",
+    detect_by_base_keyword="localhost:11434",
+    is_gateway=False,
+    is_local=True,
+    is_oauth=False,
+),
+```
+
+Usage:
+```bash
+# Start Ollama server locally
+ollama serve
+
+# Use with OpenHarness
+oh --base-url http://localhost:11434/v1 --model llama2
+```
+
+## Backend Types
+
+### Anthropic Backend
+- Uses the official Anthropic Python SDK
+- Best for Claude models
+- Supports advanced features like tool calling
+
+### OpenAI Compatible Backend
+- Uses the OpenAI Python SDK
+- Works with any OpenAI-compatible API
+- Most providers use this backend
+
+### Copilot Backend
+- Special OAuth flow for GitHub Copilot
+- Requires `api_format=copilot`
+
+## Detection Priority
+
+The system checks for providers in this order:
+
+1. **API Key Prefix**: `sk-or-` → OpenRouter, `gsk_` → Groq
+2. **Base URL**: `openrouter.ai` → OpenRouter, `deepseek.com` → DeepSeek
+3. **Model Keywords**: `claude` → Anthropic, `gpt` → OpenAI
+
+## Testing Your Provider
+
+1. **Add to registry**
+2. **Set environment variable**
+3. **Test detection**:
+   ```bash
+   oh --model your-model-name --dry-run
+   ```
+4. **Test actual usage**:
+   ```bash
+   oh --model your-model-name "Hello world"
+   ```
+
+## Troubleshooting
+
+### Provider Not Detected
+- Check that keywords match your model names
+- Verify API key prefix or base URL keywords
+- Use explicit `--base-url` and `--api-format openai`
+
+### Authentication Errors
+- Verify API key is set correctly
+- Check API key format (some providers have specific prefixes)
+- Ensure the API key has necessary permissions
+
+### Connection Issues
+- Verify base URL is correct
+- Check network connectivity
+- Some providers require specific regions or endpoints
+
+## Contributing
+
+When adding a new provider:
+
+1. Test with multiple models
+2. Verify API compatibility
+3. Add appropriate keywords for detection
+4. Update this documentation
+5. Consider adding tests
+
+The provider registry in `src/openharness/api/registry.py` is the single source of truth for all provider configurations.
\ No newline at end of file
diff --git a/scripts/demo_providers.py b/scripts/demo_providers.py
new file mode 100644
index 00000000..a948ebb8
--- /dev/null
+++ b/scripts/demo_providers.py
@@ -0,0 +1,286 @@
+#!/usr/bin/env python3
+"""
+Demo script for adding and testing LLM providers in OpenHarness.
+
+This script demonstrates the provider registry structure and how to add new providers.
+It runs without requiring OpenHarness dependencies to be installed.
+
+Usage:
+    python scripts/demo_providers.py
+"""
+
+from __future__ import annotations
+
+import os
+import sys
+from pathlib import Path
+from dataclasses import dataclass
+from typing import Tuple
+
+
+@dataclass(frozen=True)
+class ProviderSpec:
+    """One LLM provider's metadata."""
+    name: str
+    keywords: tuple[str, ...]
+    env_key: str
+    display_name: str = ""
+    backend_type: str = "openai_compat"
+    default_base_url: str = ""
+    detect_by_key_prefix: str = ""
+    detect_by_base_keyword: str = ""
+    is_gateway: bool = False
+    is_local: bool = False
+    is_oauth: bool = False
+
+    @property
+    def label(self) -> str:
+        return self.display_name or self.name.title()
+
+
+# Sample providers (subset from the actual registry)
+SAMPLE_PROVIDERS = (
+    ProviderSpec(
+        name="anthropic",
+        keywords=("anthropic", "claude"),
+        env_key="ANTHROPIC_API_KEY",
+        display_name="Anthropic",
+        backend_type="anthropic",
+    ),
+    ProviderSpec(
+        name="openai",
+        keywords=("openai", "gpt", "o1", "o3", "o4"),
+        env_key="OPENAI_API_KEY",
+        display_name="OpenAI",
+        backend_type="openai_compat",
+    ),
+    ProviderSpec(
+        name="openrouter",
+        keywords=("openrouter",),
+        env_key="OPENROUTER_API_KEY",
+        display_name="OpenRouter",
+        backend_type="openai_compat",
+        default_base_url="https://openrouter.ai/api/v1",
+        detect_by_key_prefix="sk-or-",
+        detect_by_base_keyword="openrouter",
+        is_gateway=True,
+    ),
+    ProviderSpec(
+        name="deepseek",
+        keywords=("deepseek",),
+        env_key="DEEPSEEK_API_KEY",
+        display_name="DeepSeek",
+        backend_type="openai_compat",
+        default_base_url="https://api.deepseek.com/v1",
+        detect_by_base_keyword="deepseek",
+    ),
+    ProviderSpec(
+        name="groq",
+        keywords=("groq",),
+        env_key="GROQ_API_KEY",
+        display_name="Groq",
+        backend_type="openai_compat",
+        default_base_url="https://api.groq.com/openai/v1",
+        detect_by_key_prefix="gsk_",
+        detect_by_base_keyword="groq",
+    ),
+    ProviderSpec(
+        name="ollama",
+        keywords=("ollama",),
+        env_key="",
+        display_name="Ollama",
+        backend_type="openai_compat",
+        default_base_url="http://localhost:11434/v1",
+        detect_by_base_keyword="localhost:11434",
+        is_local=True,
+    ),
+)
+
+
+def demo_provider_detection():
+    """Demonstrate how provider detection works."""
+    print("🔍 Provider Detection Demo")
+    print("=" * 50)
+
+    def detect_provider(model: str, api_key: str | None = None, base_url: str | None = None) -> ProviderSpec | None:
+        """Simplified detection logic."""
+        # 1. API key prefix
+        if api_key:
+            for spec in SAMPLE_PROVIDERS:
+                if spec.detect_by_key_prefix and api_key.startswith(spec.detect_by_key_prefix):
+                    return spec
+
+        # 2. Base URL keyword
+        if base_url:
+            base_lower = base_url.lower()
+            for spec in SAMPLE_PROVIDERS:
+                if spec.detect_by_base_keyword and spec.detect_by_base_keyword in base_lower:
+                    return spec
+
+        # 3. Model keyword
+        if model:
+            model_lower = model.lower()
+            for spec in SAMPLE_PROVIDERS:
+                if any(kw in model_lower for kw in spec.keywords):
+                    return spec
+        return None
+
+    test_cases = [
+        # (model, api_key, base_url, expected_provider)
+        ("claude-3-5-sonnet-20241022", None, None, "anthropic"),
+        ("gpt-4o", None, None, "openai"),
+        ("deepseek-chat", None, None, "deepseek"),
+        ("openai/gpt-4o-mini", "sk-or-v1-123", None, "openrouter"),
+        ("llama3-70b-8192", "gsk_123", None, "groq"),
+        ("custom-model", None, "https://api.deepseek.com/v1", "deepseek"),
+        ("ollama-model", None, "http://localhost:11434/v1", "ollama"),
+    ]
+
+    for model, api_key, base_url, expected in test_cases:
+        detected = detect_provider(model, api_key, base_url)
+        provider_name = detected.name if detected else "unknown"
+        status = "✅" if provider_name == expected else "❌"
+        print(f"{status} {model} → {provider_name} (expected: {expected})")
+
+
+def demo_adding_provider():
+    """Demonstrate adding a new provider."""
+    print("\n🆕 Adding a New Provider Demo")
+    print("=" * 50)
+
+    # Example: Adding a fictional provider "ExampleAI"
+    new_provider = ProviderSpec(
+        name="exampleai",
+        keywords=("example", "exampleai"),
+        env_key="EXAMPLEAI_API_KEY",
+        display_name="ExampleAI",
+        backend_type="openai_compat",
+        default_base_url="https://api.exampleai.com/v1",
+        detect_by_key_prefix="exa_",
+        detect_by_base_keyword="exampleai",
+        is_gateway=False,
+        is_local=False,
+        is_oauth=False,
+    )
+
+    print("New provider spec:")
+    print(f"  Name: {new_provider.name}")
+    print(f"  Display: {new_provider.display_name}")
+    print(f"  Backend: {new_provider.backend_type}")
+    print(f"  Base URL: {new_provider.default_base_url}")
+    print(f"  Keywords: {new_provider.keywords}")
+    print(f"  Key Prefix: {new_provider.detect_by_key_prefix}")
+
+    # Test detection with the new provider
+    print("\nTesting detection with new provider:")
+
+    def test_detection(model, api_key=None, base_url=None):
+        """Test detection with the new provider included."""
+        all_providers = list(SAMPLE_PROVIDERS) + [new_provider]
+
+        # Check against all providers
+        for spec in all_providers:
+            if api_key and spec.detect_by_key_prefix and api_key.startswith(spec.detect_by_key_prefix):
+                return spec
+            if base_url and spec.detect_by_base_keyword and spec.detect_by_base_keyword in base_url.lower():
+                return spec
+            if model and any(kw in model.lower() for kw in spec.keywords):
+                return spec
+        return None
+
+    test_cases = [
+        ("example-gpt-4", None, None),
+        ("custom-model", "exa_123", None),
+        ("any-model", None, "https://api.exampleai.com/v1"),
+    ]
+
+    for model, api_key, base_url in test_cases:
+        detected = test_detection(model, api_key, base_url)
+        result = detected.name if detected else "not detected"
+        print(f"  {model} → {result}")
+
+
+def demo_provider_configuration():
+    """Show different ways to configure providers."""
+    print("\n⚙️ Provider Configuration Examples")
+    print("=" * 50)
+
+    providers = [
+        ("Anthropic", "ANTHROPIC_API_KEY", "claude-3-5-sonnet-20241022", None),
+        ("OpenAI", "OPENAI_API_KEY", "gpt-4o", "https://api.openai.com/v1"),
+        ("OpenRouter", "OPENROUTER_API_KEY", "openai/gpt-4o-mini", "https://openrouter.ai/api/v1"),
+        ("DeepSeek", "DEEPSEEK_API_KEY", "deepseek-chat", "https://api.deepseek.com/v1"),
+        ("Groq", "GROQ_API_KEY", "llama3-70b-8192", "https://api.groq.com/openai/v1"),
+        ("Ollama", None, "llama2", "http://localhost:11434/v1"),
+    ]
+
+    for name, env_var, model, base_url in providers:
+        print(f"\n{name}:")
+        if env_var:
+            print(f"  export {env_var}='your-key-here'")
+        print(f"  oh --model {model}")
+        if base_url:
+            print(f"  # Base URL: {base_url}")
+        else:
+            print("  # Uses default base URL from registry")
+
+
+def demo_registry_inspection():
+    """Show what's currently in the provider registry."""
+    print("\n📋 Sample Provider Registry")
+    print("=" * 50)
+
+    print(f"Total providers: {len(SAMPLE_PROVIDERS)}")
+
+    categories = {
+        "Gateways": [p for p in SAMPLE_PROVIDERS if p.is_gateway],
+        "Cloud Providers": [p for p in SAMPLE_PROVIDERS if not p.is_gateway and not p.is_local and not p.is_oauth],
+        "Local Deployments": [p for p in SAMPLE_PROVIDERS if p.is_local],
+        "OAuth Providers": [p for p in SAMPLE_PROVIDERS if p.is_oauth],
+    }
+
+    for category, providers in categories.items():
+        if providers:
+            print(f"\n{category} ({len(providers)}):")
+            for provider in providers:
+                keywords = ", ".join(provider.keywords)
+                print(f"  - {provider.display_name} ({provider.name}): {keywords}")
+
+
+def demo_cli_usage():
+    """Show example CLI commands for different providers."""
+    print("\n💻 CLI Usage Examples")
+    print("=" * 50)
+
+    examples = [
+        ("Anthropic Claude", "oh --model claude-3-5-sonnet-20241022 'Hello world'"),
+        ("OpenAI GPT-4", "oh --model gpt-4o 'Write a function'"),
+        ("OpenRouter (any model)", "export OPENROUTER_API_KEY='sk-or-...'\noh --model anthropic/claude-3-haiku 'Quick task'"),
+        ("DeepSeek", "export DEEPSEEK_API_KEY='...'\noh --model deepseek-chat 'Code review'"),
+        ("Groq (fast inference)", "export GROQ_API_KEY='gsk_...'\noh --model llama3-70b-8192 'Analyze this'"),
+        ("Ollama (local)", "oh --base-url http://localhost:11434/v1 --model llama2 'Local AI chat'"),
+        ("Custom provider", "export CUSTOM_API_KEY='...'\noh --base-url https://api.custom.com/v1 --model gpt-4 'Use custom API'"),
+    ]
+
+    for description, command in examples:
+        print(f"\n{description}:")
+        print(f"  {command}")
+
+
+def main():
+    """Run all demos."""
+    print("🚀 OpenHarness LLM Provider Demo")
+    print("=" * 60)
+
+    demo_registry_inspection()
+    demo_provider_detection()
+    demo_adding_provider()
+    demo_provider_configuration()
+    demo_cli_usage()
+
+    print("\n" + "=" * 60)
+    print("✨ Demo complete! Check docs/LLM_PROVIDERS.md for more details.")
+
+
+if __name__ == "__main__":
+    main()
\ No newline at end of file
diff --git a/src/openharness/cli.py b/src/openharness/cli.py
index d7e21cc5..a625e123 100644
--- a/src/openharness/cli.py
+++ b/src/openharness/cli.py
@@ -4,6 +4,7 @@
 
 import json
 import sys
+import time
 from pathlib import Path
 from typing import Optional
 
@@ -29,11 +30,13 @@
 plugin_app = typer.Typer(name="plugin", help="Manage plugins")
 auth_app = typer.Typer(name="auth", help="Manage authentication")
 cron_app = typer.Typer(name="cron", help="Manage cron scheduler and jobs")
+evidence_app = typer.Typer(name="evidence", help="Manage run evidence archives")
 
 app.add_typer(mcp_app)
 app.add_typer(plugin_app)
 app.add_typer(auth_app)
 app.add_typer(cron_app)
+app.add_typer(evidence_app)
 
 
 # ---- mcp subcommands ----
@@ -246,6 +249,102 @@ def cron_logs_cmd(
         print(line)
 
 
+# ---- evidence subcommands ----
+
+@evidence_app.command("list")
+def evidence_list() -> None:
+    """List all runs with evidence."""
+    from openharness.evidence import EvidenceStore
+
+    store = EvidenceStore()
+    runs = store.list_runs()
+    if not runs:
+        print("No runs with evidence found.")
+        return
+
+    print(f"Found {len(runs)} runs:")
+    for run_id in runs:
+        summary = store.get_run_summary(run_id)
+        evidence_count = summary["total_records"]
+        print(f"  {run_id} ({evidence_count} records)")
+
+
+@evidence_app.command("summary")
+def evidence_summary(
+    run_id: str = typer.Argument(..., help="Run ID to summarize"),
+) -> None:
+    """Show detailed summary of evidence for a run."""
+    from openharness.evidence import EvidenceStore
+
+    store = EvidenceStore()
+    summary = store.get_run_summary(run_id)
+
+    print(f"Run: {run_id}")
+    print(f"Total Records: {summary['total_records']}")
+
+    if summary['time_range']['start'] and summary['time_range']['end']:
+        duration = summary['time_range']['end'] - summary['time_range']['start']
+        print(f"Duration: {duration:.2f} seconds")
+        print(f"Time Range: {time.ctime(summary['time_range']['start'])} - {time.ctime(summary['time_range']['end'])}")
+
+    print("\nEvidence Counts:")
+    for evidence_type, count in summary['evidence_counts'].items():
+        print(f"  {evidence_type}: {count}")
+
+
+@evidence_app.command("export")
+def evidence_export(
+    run_id: str = typer.Argument(..., help="Run ID to export"),
+    output: str | None = typer.Option(None, "--output", "-o", help="Output file path"),
+    format: str = typer.Option("json", "--format", "-f", help="Export format (json, archive)"),
+) -> None:
+    """Export evidence for a run."""
+    from pathlib import Path
+    from openharness.evidence import EvidenceArchiver
+
+    archiver = EvidenceArchiver()
+    output_path = Path(output) if output else None
+
+    if format == "json":
+        result_path = archiver.export_run_to_json(run_id, output_path)
+        print(f"Exported to: {result_path}")
+    elif format == "archive":
+        result_path = archiver.create_run_archive(run_id, output_path)
+        print(f"Archived to: {result_path}")
+    else:
+        print(f"Unsupported format: {format}", file=sys.stderr)
+        raise typer.Exit(1)
+
+
+@evidence_app.command("report")
+def evidence_report(
+    run_id: str = typer.Argument(..., help="Run ID to report on"),
+    output: str | None = typer.Option(None, "--output", "-o", help="Output file path"),
+) -> None:
+    """Generate a human-readable report for a run."""
+    from pathlib import Path
+    from openharness.evidence import EvidenceArchiver
+
+    archiver = EvidenceArchiver()
+    output_path = Path(output) if output else None
+
+    result_path = archiver.create_run_report(run_id, output_path)
+    print(f"Report generated: {result_path}")
+
+
+@evidence_app.command("cleanup")
+def evidence_cleanup(
+    days: int = typer.Option(30, "--days", "-d", help="Remove evidence older than this many days"),
+) -> None:
+    """Clean up old evidence archives and run data."""
+    from openharness.evidence import EvidenceArchiver
+
+    archiver = EvidenceArchiver()
+    results = archiver.cleanup_archives(days)
+
+    print(f"Cleaned up {results['removed_runs']} old runs and {results['removed_archives']} old archives")
+
+
 # ---- auth subcommands ----
 
 # Mapping from provider name to human-readable label for interactive prompts.
diff --git a/src/openharness/evidence/README.md b/src/openharness/evidence/README.md
new file mode 100644
index 00000000..3fbaa35d
--- /dev/null
+++ b/src/openharness/evidence/README.md
@@ -0,0 +1,157 @@
+# Run-Level Evidence Layer
+
+The run-level evidence layer provides structured archiving for agent runs in OpenHarness. It captures comprehensive evidence of agent execution, including conversations, tasks, performance metrics, and errors.
+
+## Overview
+
+The evidence layer consists of several components:
+
+- **Evidence Types**: Data models for different types of evidence records
+- **Evidence Store**: Storage and retrieval system using JSON Lines format
+- **Evidence Collector**: Collection utilities for capturing evidence during runs
+- **Evidence Archiver**: Archiving, export, and reporting utilities
+- **CLI Commands**: Command-line interface for managing evidence
+
+## Evidence Types
+
+The system captures the following types of evidence:
+
+- `run_start` / `run_end`: Run lifecycle events
+- `task_start` / `task_progress` / `task_end`: Task execution evidence
+- `conversation_message`: Chat messages and tool calls
+- `hook_execution`: Hook execution results
+- `state_change`: Application state transitions
+- `performance_metric`: Performance measurements
+- `error`: Errors and exceptions
+
+## Usage
+
+### Basic Collection
+
+```python
+from openharness.evidence import EvidenceCollector
+
+collector = EvidenceCollector(run_id="my-run-123")
+
+# Record run start
+collector.record_run_start(
+    session_id="session-456",
+    cwd="/workspace",
+    command_line="oh --model gpt-4"
+)
+
+# Record task execution
+collector.record_task_start(task_record)
+
+# Record conversation
+collector.record_conversation_message(message)
+
+# Record run end
+collector.record_run_end()
+```
+
+### Context Manager
+
+```python
+from openharness.evidence import EvidenceCollector
+
+collector = EvidenceCollector()
+
+async with collector.collect_run_evidence(
+    session_id="session-456",
+    cwd="/workspace"
+) as collector:
+    # Run your agent logic here
+    # Evidence is automatically collected
+    pass
+```
+
+### CLI Commands
+
+```bash
+# List all runs with evidence
+oh evidence list
+
+# Show summary of a run
+oh evidence summary <run-id>
+
+# Export evidence to JSON
+oh evidence export <run-id> --format json
+
+# Create compressed archive
+oh evidence export <run-id> --format archive
+
+# Generate human-readable report
+oh evidence report <run-id>
+
+# Clean up old evidence
+oh evidence cleanup --days 30
+```
+
+## Storage Format
+
+Evidence is stored in JSON Lines format under `~/.openharness/evidence/<run-id>/`:
+
+```
+evidence/
+├── run-123/
+│   ├── run_start.jsonl
+│   ├── task_start.jsonl
+│   ├── conversation_message.jsonl
+│   └── run_end.jsonl
+└── run-456/
+    └── ...
+```
+
+Each line contains a complete evidence record:
+
+```json
+{
+  "id": "uuid",
+  "timestamp": 1234567890.123,
+  "type": "run_start",
+  "run_id": "run-123",
+  "agent_id": "agent-1",
+  "session_id": "session-456",
+  "cwd": "/workspace",
+  "command_line": "oh --model gpt-4"
+}
+```
+
+## Integration Points
+
+The evidence layer integrates with existing OpenHarness components:
+
+- **Task Manager**: Automatically records task lifecycle events
+- **Query Engine**: Captures conversation history and tool usage
+- **Hook System**: Records hook execution results
+- **Swarm Coordinator**: Tracks multi-agent interactions
+- **Error Handling**: Captures exceptions and failures
+
+## Configuration
+
+Evidence collection can be configured through:
+
+- Environment variables
+- Configuration files
+- Programmatic settings
+
+The evidence directory location can be customized by setting the `EvidenceStore` base directory.
+
+## Performance Considerations
+
+- Evidence is written asynchronously to minimize impact on agent performance
+- Large evidence collections can be archived and cleaned up automatically
+- JSON Lines format allows for efficient streaming and partial reads
+- Compression is used for long-term storage
+
+## Security
+
+Evidence may contain sensitive information such as:
+
+- API keys (redacted in storage)
+- File paths and contents
+- Conversation history
+- Error messages
+
+Consider access controls and encryption for production deployments.
\ No newline at end of file
diff --git a/src/openharness/evidence/__init__.py b/src/openharness/evidence/__init__.py
new file mode 100644
index 00000000..cc01406e
--- /dev/null
+++ b/src/openharness/evidence/__init__.py
@@ -0,0 +1,33 @@
+"""Run-level evidence layer for structured archiving of agent runs."""
+
+from __future__ import annotations
+
+from openharness.evidence.archiver import EvidenceArchiver
+from openharness.evidence.collector import EvidenceCollector
+from openharness.evidence.store import EvidenceStore
+from openharness.evidence.types import (
+    EvidenceRecord,
+    EvidenceType,
+    RunEvidence,
+    TaskEvidence,
+    ConversationEvidence,
+    HookEvidence,
+    StateEvidence,
+    PerformanceEvidence,
+    ErrorEvidence,
+)
+
+__all__ = [
+    "EvidenceArchiver",
+    "EvidenceCollector",
+    "EvidenceStore",
+    "EvidenceRecord",
+    "EvidenceType",
+    "RunEvidence",
+    "TaskEvidence",
+    "ConversationEvidence",
+    "HookEvidence",
+    "StateEvidence",
+    "PerformanceEvidence",
+    "ErrorEvidence",
+]
\ No newline at end of file
diff --git a/src/openharness/evidence/archiver.py b/src/openharness/evidence/archiver.py
new file mode 100644
index 00000000..b6f312e6
--- /dev/null
+++ b/src/openharness/evidence/archiver.py
@@ -0,0 +1,175 @@
+"""Evidence archiving and management utilities."""
+
+from __future__ import annotations
+
+import json
+import time
+from pathlib import Path
+from typing import Any
+from uuid import uuid4
+
+from openharness.evidence.store import EvidenceStore
+
+
+class EvidenceArchiver:
+    """Utilities for archiving and managing evidence collections."""
+
+    def __init__(self, store: EvidenceStore | None = None) -> None:
+        self.store = store or EvidenceStore()
+
+    def create_run_archive(
+        self,
+        run_id: str,
+        archive_path: Path | None = None,
+        include_metadata: bool = True,
+    ) -> Path:
+        """Create a compressed archive of all evidence for a run."""
+        if archive_path is None:
+            timestamp = int(time.time())
+            archive_path = self.store.base_dir / f"{run_id}_{timestamp}.tar.gz"
+
+        self.store.archive_run(run_id, archive_path)
+        return archive_path
+
+    def export_run_to_json(
+        self,
+        run_id: str,
+        output_path: Path | None = None,
+        pretty: bool = True,
+    ) -> Path:
+        """Export all evidence for a run to a single JSON file."""
+        if output_path is None:
+            output_path = self.store.base_dir / f"{run_id}_export.json"
+
+        evidence_list = list(self.store.get_evidence(run_id))
+        evidence_data = [evidence.__dict__ for evidence in evidence_list]
+
+        with open(output_path, "w", encoding="utf-8") as f:
+            json.dump(
+                {
+                    "run_id": run_id,
+                    "export_timestamp": time.time(),
+                    "evidence_count": len(evidence_data),
+                    "evidence": evidence_data,
+                },
+                f,
+                indent=2 if pretty else None,
+                ensure_ascii=False,
+            )
+
+        return output_path
+
+    def import_run_from_json(self, json_path: Path, new_run_id: str | None = None) -> str:
+        """Import evidence from a JSON export file."""
+        with open(json_path, "r", encoding="utf-8") as f:
+            data = json.load(f)
+
+        run_id = new_run_id or data["run_id"] or str(uuid4())
+
+        # Import each evidence record
+        for evidence_dict in data["evidence"]:
+            # Create a generic EvidenceRecord from the dict
+            from openharness.evidence.types import EvidenceRecord
+
+            evidence = EvidenceRecord()
+            for key, value in evidence_dict.items():
+                if hasattr(evidence, key):
+                    setattr(evidence, key, value)
+
+            # Override run_id if specified
+            if new_run_id:
+                evidence.run_id = new_run_id
+
+            self.store.store_evidence(evidence)
+
+        return run_id
+
+    def create_run_report(
+        self,
+        run_id: str,
+        report_path: Path | None = None,
+        include_details: bool = True,
+    ) -> Path:
+        """Create a human-readable report of a run's evidence."""
+        if report_path is None:
+            report_path = self.store.base_dir / f"{run_id}_report.md"
+
+        summary = self.store.get_run_summary(run_id)
+        evidence_list = list(self.store.get_evidence(run_id))
+
+        with open(report_path, "w", encoding="utf-8") as f:
+            f.write(f"# Run Evidence Report: {run_id}\n\n")
+
+            f.write("## Summary\n\n")
+            f.write(f"- **Total Records**: {summary['total_records']}\n")
+            if summary['time_range']['start'] and summary['time_range']['end']:
+                duration = summary['time_range']['end'] - summary['time_range']['start']
+                f.write(f"- **Duration**: {duration:.2f} seconds\n")
+                f.write(f"- **Time Range**: {time.ctime(summary['time_range']['start'])} - {time.ctime(summary['time_range']['end'])}\n")
+
+            f.write("\n## Evidence Counts\n\n")
+            for evidence_type, count in summary['evidence_counts'].items():
+                f.write(f"- **{evidence_type}**: {count}\n")
+
+            if include_details:
+                f.write("\n## Detailed Evidence\n\n")
+
+                # Group by type
+                by_type = {}
+                for evidence in evidence_list:
+                    by_type.setdefault(evidence.type, []).append(evidence)
+
+                for evidence_type, records in by_type.items():
+                    f.write(f"### {evidence_type.title()}\n\n")
+
+                    for record in sorted(records, key=lambda r: r.timestamp):
+                        f.write(f"**{time.ctime(record.timestamp)}**\n\n")
+
+                        # Show relevant fields based on type
+                        if hasattr(record, 'description') and record.description:
+                            f.write(f"- Description: {record.description}\n")
+                        if hasattr(record, 'status') and record.status:
+                            f.write(f"- Status: {record.status}\n")
+                        if hasattr(record, 'error_message') and record.error_message:
+                            f.write(f"- Error: {record.error_message}\n")
+                        if hasattr(record, 'content') and record.content:
+                            content_preview = record.content[:200] + "..." if len(record.content) > 200 else record.content
+                            f.write(f"- Content: {content_preview}\n")
+
+                        f.write("\n")
+
+        return report_path
+
+    def cleanup_archives(self, max_age_days: int = 30) -> dict[str, int]:
+        """Clean up old evidence archives and runs."""
+        results = {
+            "removed_runs": self.store.cleanup_old_runs(max_age_days),
+            "removed_archives": 0,
+        }
+
+        # Also clean up archive files
+        archive_pattern = self.store.base_dir / "*.tar.gz"
+        cutoff_time = time.time() - (max_age_days * 24 * 60 * 60)
+
+        for archive_file in self.store.base_dir.glob("*.tar.gz"):
+            if archive_file.stat().st_mtime < cutoff_time:
+                archive_file.unlink()
+                results["removed_archives"] += 1
+
+        return results
+
+    def list_archives(self) -> list[dict[str, Any]]:
+        """List all available evidence archives."""
+        archives = []
+
+        for archive_file in self.store.base_dir.glob("*.tar.gz"):
+            stat = archive_file.stat()
+            archives.append({
+                "path": archive_file,
+                "name": archive_file.name,
+                "size": stat.st_size,
+                "created": stat.st_ctime,
+                "modified": stat.st_mtime,
+            })
+
+        return sorted(archives, key=lambda x: x["created"], reverse=True)
\ No newline at end of file
diff --git a/src/openharness/evidence/collector.py b/src/openharness/evidence/collector.py
new file mode 100644
index 00000000..37ad79a5
--- /dev/null
+++ b/src/openharness/evidence/collector.py
@@ -0,0 +1,306 @@
+"""Evidence collection system for capturing run-level data."""
+
+from __future__ import annotations
+
+import asyncio
+import time
+import traceback
+from contextlib import asynccontextmanager
+from pathlib import Path
+from typing import Any, AsyncIterator
+from uuid import uuid4
+
+from openharness.engine.messages import ConversationMessage
+from openharness.evidence.store import EvidenceStore
+from openharness.evidence.types import (
+    ConversationEvidence,
+    ErrorEvidence,
+    EvidenceRecord,
+    HookEvidence,
+    PerformanceEvidence,
+    RunEvidence,
+    StateEvidence,
+    TaskEvidence,
+)
+from openharness.hooks.types import AggregatedHookResult, HookResult
+from openharness.tasks.types import TaskRecord
+
+
+class EvidenceCollector:
+    """Collects and stores evidence from agent runs."""
+
+    def __init__(self, run_id: str | None = None, store: EvidenceStore | None = None) -> None:
+        self.run_id = run_id or str(uuid4())
+        self.store = store or EvidenceStore()
+        self.agent_id = ""
+        self._start_time = time.time()
+
+    def set_agent_id(self, agent_id: str) -> None:
+        """Set the current agent ID for evidence records."""
+        self.agent_id = agent_id
+
+    def record_run_start(
+        self,
+        session_id: str = "",
+        cwd: str = "",
+        command_line: str = "",
+        config: dict[str, Any] | None = None,
+        environment: dict[str, str] | None = None,
+    ) -> None:
+        """Record the start of a run."""
+        evidence = RunEvidence(
+            type="run_start",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            session_id=session_id,
+            cwd=cwd,
+            command_line=command_line,
+            config=config or {},
+            environment=environment or {},
+            timestamp=self._start_time,
+        )
+        self.store.store_evidence(evidence)
+
+    def record_run_end(self, final_status: str = "completed") -> None:
+        """Record the end of a run."""
+        evidence = RunEvidence(
+            type="run_end",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            metadata={"final_status": final_status, "duration": time.time() - self._start_time},
+        )
+        self.store.store_evidence(evidence)
+
+    def record_task_start(self, task: TaskRecord) -> None:
+        """Record the start of a task."""
+        evidence = TaskEvidence(
+            type="task_start",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            task_id=task.id,
+            task_type=task.type,
+            description=task.description,
+            status=task.status,
+            command=task.command,
+            cwd=task.cwd,
+            output_file=str(task.output_file),
+            metadata={"created_at": task.created_at, "started_at": task.started_at},
+        )
+        self.store.store_evidence(evidence)
+
+    def record_task_progress(self, task_id: str, progress_data: dict[str, Any]) -> None:
+        """Record progress on a task."""
+        evidence = TaskEvidence(
+            type="task_progress",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            task_id=task_id,
+            metadata=progress_data,
+        )
+        self.store.store_evidence(evidence)
+
+    def record_task_end(self, task: TaskRecord) -> None:
+        """Record the end of a task."""
+        duration = 0.0
+        if task.started_at and task.ended_at:
+            duration = task.ended_at - task.started_at
+
+        evidence = TaskEvidence(
+            type="task_end",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            task_id=task.id,
+            status=task.status,
+            return_code=task.return_code,
+            duration=duration,
+            metadata={
+                "ended_at": task.ended_at,
+                "return_code": task.return_code,
+                "duration": duration,
+            },
+        )
+        self.store.store_evidence(evidence)
+
+    def record_conversation_message(
+        self,
+        message: ConversationMessage,
+        token_count: int = 0,
+        model: str = "",
+    ) -> None:
+        """Record a conversation message."""
+        evidence = ConversationEvidence(
+            type="conversation_message",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            message_type=message.message_type,
+            content=message.content,
+            role=getattr(message, "role", ""),
+            tool_calls=getattr(message, "tool_calls", []),
+            tool_results=getattr(message, "tool_results", []),
+            token_count=token_count,
+            model=model,
+            metadata={"message_id": getattr(message, "id", "")},
+        )
+        self.store.store_evidence(evidence)
+
+    def record_tool_call(
+        self,
+        tool_name: str,
+        arguments: dict[str, Any],
+        tool_call_id: str = "",
+    ) -> None:
+        """Record a tool call."""
+        evidence = ConversationEvidence(
+            type="tool_call",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            metadata={
+                "tool_name": tool_name,
+                "arguments": arguments,
+                "tool_call_id": tool_call_id,
+            },
+        )
+        self.store.store_evidence(evidence)
+
+    def record_tool_result(
+        self,
+        tool_call_id: str,
+        result: Any,
+        success: bool = True,
+        error_message: str = "",
+    ) -> None:
+        """Record a tool result."""
+        evidence = ConversationEvidence(
+            type="tool_result",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            metadata={
+                "tool_call_id": tool_call_id,
+                "result": str(result),
+                "success": success,
+                "error_message": error_message,
+            },
+        )
+        self.store.store_evidence(evidence)
+
+    def record_hook_execution(
+        self,
+        event: str,
+        result: AggregatedHookResult,
+        duration: float = 0.0,
+    ) -> None:
+        """Record hook execution results."""
+        for hook_result in result.results:
+            evidence = HookEvidence(
+                type="hook_execution",
+                run_id=self.run_id,
+                agent_id=self.agent_id,
+                event=event,
+                hook_type=hook_result.hook_type,
+                success=hook_result.success,
+                output=hook_result.output,
+                blocked=hook_result.blocked,
+                reason=hook_result.reason,
+                duration=duration,
+                metadata=hook_result.metadata,
+            )
+            self.store.store_evidence(evidence)
+
+    def record_state_change(
+        self,
+        state_type: str,
+        previous_state: dict[str, Any],
+        new_state: dict[str, Any],
+        change_reason: str = "",
+    ) -> None:
+        """Record a state change."""
+        evidence = StateEvidence(
+            type="state_change",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            state_type=state_type,
+            previous_state=previous_state,
+            new_state=new_state,
+            change_reason=change_reason,
+        )
+        self.store.store_evidence(evidence)
+
+    def record_performance_metric(
+        self,
+        metric_name: str,
+        value: float,
+        unit: str = "",
+        category: str = "",
+        context: dict[str, Any] | None = None,
+    ) -> None:
+        """Record a performance metric."""
+        evidence = PerformanceEvidence(
+            type="performance_metric",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            metric_name=metric_name,
+            value=value,
+            unit=unit,
+            category=category,
+            context=context or {},
+        )
+        self.store.store_evidence(evidence)
+
+    def record_error(
+        self,
+        error_type: str,
+        error_message: str,
+        context: dict[str, Any] | None = None,
+        exc: Exception | None = None,
+        recoverable: bool = False,
+    ) -> None:
+        """Record an error or exception."""
+        tb_str = ""
+        if exc:
+            tb_str = "".join(traceback.format_exception(type(exc), exc, exc.__traceback__))
+
+        evidence = ErrorEvidence(
+            type="error",
+            run_id=self.run_id,
+            agent_id=self.agent_id,
+            error_type=error_type,
+            error_message=error_message,
+            traceback=tb_str,
+            context=context or {},
+            recoverable=recoverable,
+        )
+        self.store.store_evidence(evidence)
+
+    @asynccontextmanager
+    async def collect_run_evidence(
+        self,
+        session_id: str = "",
+        cwd: str = "",
+        command_line: str = "",
+        config: dict[str, Any] | None = None,
+        environment: dict[str, str] | None = None,
+    ) -> AsyncIterator[EvidenceCollector]:
+        """Context manager for collecting evidence for an entire run."""
+        try:
+            self.record_run_start(
+                session_id=session_id,
+                cwd=cwd,
+                command_line=command_line,
+                config=config,
+                environment=environment,
+            )
+            yield self
+        except Exception as e:
+            self.record_error(
+                "run_execution_error",
+                str(e),
+                context={"phase": "run_execution"},
+                exc=e,
+            )
+            raise
+        finally:
+            self.record_run_end()
+
+    def get_run_summary(self) -> dict[str, Any]:
+        """Get a summary of the current run's evidence."""
+        return self.store.get_run_summary(self.run_id)
\ No newline at end of file
diff --git a/src/openharness/evidence/store.py b/src/openharness/evidence/store.py
new file mode 100644
index 00000000..8b800b9d
--- /dev/null
+++ b/src/openharness/evidence/store.py
@@ -0,0 +1,179 @@
+"""Evidence storage and retrieval system."""
+
+from __future__ import annotations
+
+import json
+import time
+from pathlib import Path
+from typing import Any, Iterator
+
+from openharness.evidence.types import EvidenceRecord
+
+
+class EvidenceStore:
+    """Structured storage for run-level evidence."""
+
+    def __init__(self, base_dir: Path | None = None) -> None:
+        if base_dir is None:
+            # Lazy import to avoid dependency issues during testing
+            try:
+                from openharness.config.paths import get_data_dir
+                self.base_dir = get_data_dir() / "evidence"
+            except ImportError:
+                # Fallback for testing without full environment
+                self.base_dir = Path.home() / ".openharness" / "evidence"
+        else:
+            self.base_dir = base_dir
+        self.base_dir.mkdir(parents=True, exist_ok=True)
+
+    def _get_run_dir(self, run_id: str) -> Path:
+        """Get the directory for a specific run."""
+        return self.base_dir / run_id
+
+    def _get_evidence_file(self, run_id: str, evidence_type: str) -> Path:
+        """Get the file path for evidence of a specific type."""
+        run_dir = self._get_run_dir(run_id)
+        run_dir.mkdir(parents=True, exist_ok=True)
+        return run_dir / f"{evidence_type}.jsonl"
+
+    def store_evidence(self, evidence: EvidenceRecord) -> None:
+        """Store an evidence record."""
+        if not evidence.timestamp:
+            evidence.timestamp = time.time()
+
+        file_path = self._get_evidence_file(evidence.run_id, evidence.type)
+        record_data = {
+            "id": evidence.id,
+            "timestamp": evidence.timestamp,
+            "type": evidence.type,
+            "run_id": evidence.run_id,
+            "agent_id": evidence.agent_id,
+            "metadata": evidence.metadata,
+            **{
+                k: v for k, v in evidence.__dict__.items()
+                if k not in {"id", "timestamp", "type", "run_id", "agent_id", "metadata"}
+                and v is not None and v != "" and v != [] and v != {}
+            }
+        }
+
+        with open(file_path, "a", encoding="utf-8") as f:
+            json.dump(record_data, f, ensure_ascii=False)
+            f.write("\n")
+
+    def get_evidence(
+        self,
+        run_id: str,
+        evidence_type: str | None = None,
+        start_time: float | None = None,
+        end_time: float | None = None,
+    ) -> Iterator[EvidenceRecord]:
+        """Retrieve evidence records for a run."""
+        if evidence_type:
+            files = [self._get_evidence_file(run_id, evidence_type)]
+        else:
+            run_dir = self._get_run_dir(run_id)
+            if not run_dir.exists():
+                return
+            files = list(run_dir.glob("*.jsonl"))
+
+        for file_path in files:
+            if not file_path.exists():
+                continue
+
+            with open(file_path, "r", encoding="utf-8") as f:
+                for line in f:
+                    if not line.strip():
+                        continue
+
+                    try:
+                        data = json.loads(line)
+                        if start_time and data["timestamp"] < start_time:
+                            continue
+                        if end_time and data["timestamp"] > end_time:
+                            continue
+
+                        # Create the appropriate evidence record type
+                        evidence = EvidenceRecord(
+                            id=data["id"],
+                            timestamp=data["timestamp"],
+                            type=data["type"],
+                            run_id=data["run_id"],
+                            agent_id=data.get("agent_id", ""),
+                            metadata=data.get("metadata", {}),
+                        )
+
+                        # Add type-specific fields
+                        for k, v in data.items():
+                            if k not in {"id", "timestamp", "type", "run_id", "agent_id", "metadata"}:
+                                setattr(evidence, k, v)
+
+                        yield evidence
+                    except (json.JSONDecodeError, KeyError):
+                        continue
+
+    def list_runs(self) -> list[str]:
+        """List all run IDs that have evidence."""
+        if not self.base_dir.exists():
+            return []
+
+        return [d.name for d in self.base_dir.iterdir() if d.is_dir()]
+
+    def get_run_summary(self, run_id: str) -> dict[str, Any]:
+        """Get a summary of evidence for a run."""
+        summary = {
+            "run_id": run_id,
+            "evidence_counts": {},
+            "time_range": {"start": None, "end": None},
+            "total_records": 0,
+        }
+
+        for evidence in self.get_evidence(run_id):
+            summary["total_records"] += 1
+
+            # Count by type
+            summary["evidence_counts"][evidence.type] = (
+                summary["evidence_counts"].get(evidence.type, 0) + 1
+            )
+
+            # Track time range
+            if summary["time_range"]["start"] is None or evidence.timestamp < summary["time_range"]["start"]:
+                summary["time_range"]["start"] = evidence.timestamp
+            if summary["time_range"]["end"] is None or evidence.timestamp > summary["time_range"]["end"]:
+                summary["time_range"]["end"] = evidence.timestamp
+
+        return summary
+
+    def archive_run(self, run_id: str, archive_path: Path) -> None:
+        """Archive all evidence for a run to a compressed file."""
+        import tarfile
+
+        run_dir = self._get_run_dir(run_id)
+        if not run_dir.exists():
+            raise FileNotFoundError(f"No evidence found for run {run_id}")
+
+        with tarfile.open(archive_path, "w:gz") as tar:
+            tar.add(run_dir, arcname=run_id)
+
+    def cleanup_old_runs(self, max_age_days: int) -> int:
+        """Remove evidence for runs older than the specified age."""
+        import shutil
+
+        cutoff_time = time.time() - (max_age_days * 24 * 60 * 60)
+        removed_count = 0
+
+        for run_dir in self.base_dir.iterdir():
+            if not run_dir.is_dir():
+                continue
+
+            # Check if any evidence file is older than cutoff
+            should_remove = True
+            for evidence_file in run_dir.glob("*.jsonl"):
+                if evidence_file.stat().st_mtime > cutoff_time:
+                    should_remove = False
+                    break
+
+            if should_remove:
+                shutil.rmtree(run_dir)
+                removed_count += 1
+
+        return removed_count
\ No newline at end of file
diff --git a/src/openharness/evidence/types.py b/src/openharness/evidence/types.py
new file mode 100644
index 00000000..0a89481b
--- /dev/null
+++ b/src/openharness/evidence/types.py
@@ -0,0 +1,121 @@
+"""Evidence data models for run-level archiving."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Literal
+from uuid import uuid4
+
+
+EvidenceType = Literal[
+    "run_start",
+    "run_end",
+    "task_start",
+    "task_progress",
+    "task_end",
+    "conversation_message",
+    "tool_call",
+    "tool_result",
+    "hook_execution",
+    "state_change",
+    "performance_metric",
+    "error",
+]
+
+
+@dataclass
+class EvidenceRecord:
+    """Base class for all evidence records."""
+
+    id: str = field(default_factory=lambda: str(uuid4()))
+    timestamp: float = 0.0
+    type: EvidenceType = "run_start"
+    run_id: str = ""
+    agent_id: str = ""
+    metadata: dict[str, Any] = field(default_factory=dict)
+
+
+@dataclass
+class RunEvidence(EvidenceRecord):
+    """Evidence for run lifecycle events."""
+
+    session_id: str = ""
+    cwd: str = ""
+    command_line: str = ""
+    config: dict[str, Any] = field(default_factory=dict)
+    environment: dict[str, str] = field(default_factory=dict)
+
+
+@dataclass
+class TaskEvidence(EvidenceRecord):
+    """Evidence for task execution."""
+
+    task_id: str = ""
+    task_type: str = ""
+    description: str = ""
+    status: str = ""
+    command: str = ""
+    cwd: str = ""
+    output_file: str = ""
+    return_code: int | None = None
+    duration: float = 0.0
+    error_message: str = ""
+
+
+@dataclass
+class ConversationEvidence(EvidenceRecord):
+    """Evidence for conversation messages."""
+
+    message_type: str = ""  # "user", "assistant", "system", "tool"
+    content: str = ""
+    role: str = ""
+    tool_calls: list[dict[str, Any]] = field(default_factory=list)
+    tool_results: list[dict[str, Any]] = field(default_factory=list)
+    token_count: int = 0
+    model: str = ""
+
+
+@dataclass
+class HookEvidence(EvidenceRecord):
+    """Evidence for hook executions."""
+
+    event: str = ""
+    hook_type: str = ""
+    success: bool = True
+    output: str = ""
+    blocked: bool = False
+    reason: str = ""
+    duration: float = 0.0
+
+
+@dataclass
+class StateEvidence(EvidenceRecord):
+    """Evidence for application state changes."""
+
+    state_type: str = ""  # "app_state", "task_state", "swarm_state"
+    previous_state: dict[str, Any] = field(default_factory=dict)
+    new_state: dict[str, Any] = field(default_factory=dict)
+    change_reason: str = ""
+
+
+@dataclass
+class PerformanceEvidence(EvidenceRecord):
+    """Evidence for performance metrics."""
+
+    metric_name: str = ""
+    value: float = 0.0
+    unit: str = ""
+    category: str = ""  # "cost", "latency", "throughput", "resource"
+    context: dict[str, Any] = field(default_factory=dict)
+
+
+@dataclass
+class ErrorEvidence(EvidenceRecord):
+    """Evidence for errors and exceptions."""
+
+    error_type: str = ""
+    error_message: str = ""
+    traceback: str = ""
+    context: dict[str, Any] = field(default_factory=dict)
+    recoverable: bool = False
\ No newline at end of file
diff --git a/src/openharness/platforms.py b/src/openharness/platforms.py
index bfd66ad9..ccebf609 100644
--- a/src/openharness/platforms.py
+++ b/src/openharness/platforms.py
@@ -36,7 +36,7 @@ def detect_platform(
 
     if system == "darwin":
         return "macos"
-    if system == "windows":
+    if system in ("windows", "win32"):
         return "windows"
     if system == "linux":
         if "microsoft" in kernel_release or env_map.get("WSL_DISTRO_NAME") or env_map.get("WSL_INTEROP"):
diff --git a/src/openharness/swarm/lockfile.py b/src/openharness/swarm/lockfile.py
index 335696d9..0480eafe 100644
--- a/src/openharness/swarm/lockfile.py
+++ b/src/openharness/swarm/lockfile.py
@@ -40,7 +40,10 @@ def exclusive_file_lock(
 
 @contextmanager
 def _exclusive_posix_lock(lock_path: Path) -> Iterator[None]:
-    import fcntl
+    try:
+        import fcntl
+    except ImportError as e:
+        raise SwarmLockUnavailableError(f"fcntl not available: {e}") from e
 
     lock_path.parent.mkdir(parents=True, exist_ok=True)
     lock_path.touch(exist_ok=True)
@@ -54,7 +57,10 @@ def _exclusive_posix_lock(lock_path: Path) -> Iterator[None]:
 
 @contextmanager
 def _exclusive_windows_lock(lock_path: Path) -> Iterator[None]:
-    import msvcrt
+    try:
+        import msvcrt
+    except ImportError as e:
+        raise SwarmLockUnavailableError(f"msvcrt not available: {e}") from e
 
     lock_path.parent.mkdir(parents=True, exist_ok=True)
     with lock_path.open("a+b") as lock_file:
diff --git a/tests/test_evidence.py b/tests/test_evidence.py
new file mode 100644
index 00000000..5253e63d
--- /dev/null
+++ b/tests/test_evidence.py
@@ -0,0 +1,124 @@
+"""Tests for the evidence layer."""
+
+from __future__ import annotations
+
+import tempfile
+from pathlib import Path
+
+from openharness.evidence import EvidenceCollector, EvidenceStore, EvidenceArchiver
+from openharness.evidence.types import RunEvidence, TaskEvidence
+
+
+def test_evidence_store():
+    """Test basic evidence storage and retrieval."""
+    with tempfile.TemporaryDirectory() as temp_dir:
+        store = EvidenceStore(Path(temp_dir))
+
+        # Create and store evidence
+        evidence = RunEvidence(
+            type="run_start",
+            run_id="test-run-123",
+            agent_id="test-agent",
+            session_id="test-session",
+            cwd="/tmp",
+            command_line="test command",
+        )
+        store.store_evidence(evidence)
+
+        # Retrieve evidence
+        records = list(store.get_evidence("test-run-123"))
+        assert len(records) == 1
+        assert records[0].run_id == "test-run-123"
+        assert records[0].type == "run_start"
+
+
+def test_evidence_collector():
+    """Test evidence collection."""
+    with tempfile.TemporaryDirectory() as temp_dir:
+        store = EvidenceStore(Path(temp_dir))
+        collector = EvidenceCollector("test-run-456", store)
+
+        # Record run start
+        collector.record_run_start(
+            session_id="test-session",
+            cwd="/tmp",
+            command_line="test command",
+        )
+
+        # Record task
+        collector.record_task_start(
+            TaskEvidence(
+                task_id="task-123",
+                task_type="local_agent",
+                description="Test task",
+                status="running",
+                cwd="/tmp",
+                output_file=Path("/tmp/task.log"),
+                command="echo hello",
+            )
+        )
+
+        # Check evidence was stored
+        records = list(store.get_evidence("test-run-456"))
+        assert len(records) == 2
+
+        run_records = [r for r in records if r.type == "run_start"]
+        task_records = [r for r in records if r.type == "task_start"]
+
+        assert len(run_records) == 1
+        assert len(task_records) == 1
+        assert task_records[0].task_id == "task-123"
+
+
+def test_evidence_archiver():
+    """Test evidence archiving."""
+    with tempfile.TemporaryDirectory() as temp_dir:
+        temp_path = Path(temp_dir)
+        store = EvidenceStore(temp_path)
+        archiver = EvidenceArchiver(store)
+
+        # Create some evidence
+        evidence = RunEvidence(
+            type="run_start",
+            run_id="archive-test-run",
+            agent_id="test-agent",
+        )
+        store.store_evidence(evidence)
+
+        # Export to JSON
+        json_path = archiver.export_run_to_json("archive-test-run")
+        assert json_path.exists()
+
+        # Create archive
+        archive_path = archiver.create_run_archive("archive-test-run")
+        assert archive_path.exists()
+
+        # Create report
+        report_path = archiver.create_run_report("archive-test-run")
+        assert report_path.exists()
+        assert "Run Evidence Report" in report_path.read_text()
+
+
+def test_run_summary():
+    """Test run summary generation."""
+    with tempfile.TemporaryDirectory() as temp_dir:
+        store = EvidenceStore(Path(temp_dir))
+
+        # Create multiple evidence records
+        records = [
+            RunEvidence(type="run_start", run_id="summary-test", agent_id="agent1"),
+            TaskEvidence(type="task_start", run_id="summary-test", agent_id="agent1", task_id="task1"),
+            TaskEvidence(type="task_end", run_id="summary-test", agent_id="agent1", task_id="task1"),
+            RunEvidence(type="run_end", run_id="summary-test", agent_id="agent1"),
+        ]
+
+        for record in records:
+            store.store_evidence(record)
+
+        summary = store.get_run_summary("summary-test")
+        assert summary["run_id"] == "summary-test"
+        assert summary["total_records"] == 4
+        assert summary["evidence_counts"]["run_start"] == 1
+        assert summary["evidence_counts"]["run_end"] == 1
+        assert summary["evidence_counts"]["task_start"] == 1
+        assert summary["evidence_counts"]["task_end"] == 1
\ No newline at end of file