projnanda · mariagorskikh · May 26, 2026 · May 26, 2026 · May 26, 2026 · May 26, 2026
diff --git a/README.md b/README.md
@@ -317,6 +317,37 @@ Two things to know that aren't obvious:
 
 ---
 
+## Hackathon
+
+NEST runs a month-long hackathon where engineers (and agents) submit
+plugins, scenarios, and platform improvements as PRs against a
+`hackathon/*` branch. Every submission is scored by an automated judge
+panel along six dimensions (correctness, test rigor, API fit, docs,
+novelty, persona fidelity), each on a 1-5 scale.
+
+### Scoreboard
+
+> Live scoreboard: [`docs/hackathon/scores.json`](docs/hackathon/scores.json) — machine-readable scores for every open hackathon PR. A marketplace UI on top of this file is coming.
+
+The judge panel lives in [`scripts/judge/`](scripts/judge/):
+
+- [`scripts/judge/rubric.md`](scripts/judge/rubric.md) — the rubric prompt (versioned).
+- [`scripts/judge/judge_pr.py`](scripts/judge/judge_pr.py) — score one PR with N parallel judges via the Anthropic API, with prompt caching on the rubric.
+- [`scripts/judge/run_all.py`](scripts/judge/run_all.py) — CLI that scores every open `hackathon/*` PR and writes the scoreboard JSON. Idempotent on HEAD SHA.
+
+Re-run the full scoreboard with three live Opus judges per PR:
+
+```bash
+export ANTHROPIC_API_KEY=...
+uv run python -m scripts.judge.run_all --output docs/hackathon/scores.json
+```
+
+Without an API key the CLI falls back to deterministic mock judges so the
+schema is exercised even in CI. See the in-repo PR description for the
+full cost model and how to reproduce.
+
+---
+
 ## Contributing
 
 See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, coding

diff --git a/docs/hackathon/scores.json b/docs/hackathon/scores.json
diff --git a/pyproject.toml b/pyproject.toml
@@ -24,6 +24,17 @@ dependencies = [
     "nest-plugins-reference",
 ]
 
+[project.optional-dependencies]
+# Dependencies for the hackathon judge panel under scripts/judge/.
+# Install with: uv sync --extra judge
+# or install a single provider directly: uv pip install "anthropic>=0.30"
+# / "openai>=1.0". The judge_pr module imports providers lazily, so a
+# missing SDK only fails at the point a live judge is invoked.
+judge = [
+    "anthropic>=0.30",
+    "openai>=1.0",
+]
+
 [project.urls]
 Homepage = "https://github.com/mariagorskikh/nest"
 Repository = "https://github.com/mariagorskikh/nest"
@@ -87,5 +98,8 @@ extraPaths = [
 
 [tool.pytest.ini_options]
 asyncio_mode = "auto"
-testpaths = ["packages"]
-addopts = ["--import-mode=importlib"]
+testpaths = ["packages", "scripts"]
+addopts = ["--import-mode=importlib", "-m", "not live"]
+markers = [
+    "live: end-to-end tests that hit live external services (skipped by default)",
+]
diff --git a/scripts/conftest.py b/scripts/conftest.py
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: Apache-2.0
+"""Pytest configuration for ``scripts/`` tests.
+
+Makes the project root importable so tests can ``from scripts.judge import ...``.
+
+Example::
+
+    # No-op for the user; pytest picks this up automatically.
+"""
+
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+_ROOT = Path(__file__).resolve().parent.parent
+if str(_ROOT) not in sys.path:
+    sys.path.insert(0, str(_ROOT))
diff --git a/scripts/judge/README.md b/scripts/judge/README.md
@@ -0,0 +1,76 @@
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# NEST Hackathon Judge Panel
+
+`scripts/judge/` runs an N-judge LLM panel against every open
+`hackathon/*` PR and writes a deterministic scoreboard to
+`docs/hackathon/scores.json`.
+
+The panel is provider-pluggable. Two live providers are supported today,
+plus a deterministic mock for CI smoke runs.
+
+## Providers
+
+| `--provider` | Default model      | API-key env var      | SDK package    |
+| ------------ | ------------------ | -------------------- | -------------- |
+| `anthropic`  | `claude-opus-4-7`  | `ANTHROPIC_API_KEY`  | `anthropic`    |
+| `openai`     | `gpt-5.5`          | `OPENAI_API_KEY`     | `openai`       |
+
+- `anthropic` is the default. The rubric is sent as a
+  `cache_control: ephemeral` system block so rubric tokens are billed
+  once per 5-min window across N judges.
+- `openai` uses `openai.AsyncOpenAI` against `chat.completions` with
+  `response_format={"type": "json_object"}` so the JSON contract is
+  enforced server-side. OpenAI's caching is implicit per the docs — we
+  don't try to be clever.
+- Both providers share the same rubric, the same six dimensions, the
+  same JSON output schema, and the same median-low aggregation. The
+  `scores.json` shape is identical regardless of provider.
+
+If the selected provider's API key is unset, the CLI falls back to a
+deterministic `MockJudgeClient` so the scoreboard shape is exercised
+end-to-end without spending budget. Use `--mock` to force that path
+regardless of env.
+
+## Install
+
+```bash
+uv sync --extra judge      # pulls in both anthropic and openai SDKs
+# or one provider at a time:
+uv pip install "anthropic>=0.30"
+uv pip install "openai>=1.0"
+```
+
+## Usage
+
+```bash
+# Default: Anthropic with claude-opus-4-7
+ANTHROPIC_API_KEY=sk-ant-... \
+  uv run python -m scripts.judge.run_all --output docs/hackathon/scores.json
+
+# OpenAI with the default gpt-5.5
+OPENAI_API_KEY=sk-... \
+  uv run python -m scripts.judge.run_all --provider openai
+
+# OpenAI, pinning a specific model
+OPENAI_API_KEY=sk-... \
+  uv run python -m scripts.judge.run_all --provider openai --model gpt-5.5-pro
+
+# Force mock judges (no API keys required)
+uv run python -m scripts.judge.run_all --mock
+
+# Subset of PRs
+uv run python -m scripts.judge.run_all --pr 2 --pr 3
+```
+
+The scoreboard is idempotent: re-running only re-scores PRs whose HEAD
+SHA changed since the last write. Pass `--force` to re-score every PR.
+
+## Tests
+
+```bash
+uv run pytest scripts/judge/tests/ -v
+```
+
+Live API-touching tests live behind `@pytest.mark.live` and are skipped
+unless you pass `-m live` and set the relevant API key.
diff --git a/scripts/judge/__init__.py b/scripts/judge/__init__.py
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: Apache-2.0
+"""NEST hackathon judge panel.
+
+Modules:
+    judge_pr  -- score a single PR with N independent judges
+    run_all   -- batch-score every open hackathon PR into a scoreboard JSON
+"""
+
+from __future__ import annotations
diff --git a/scripts/judge/fixtures/hackathon-prs-2026-05-26.json b/scripts/judge/fixtures/hackathon-prs-2026-05-26.json