diff --git a/README.md b/README.md index 9f45800..d65b4fd 100644 --- a/README.md +++ b/README.md @@ -136,7 +136,7 @@ deck.pdf + repo URL + rubric source | `apps/web` | `glasshat-web` | Next.js 16: landing + `/judge` (batch rank · tie-break · gate-2 override · lock) + `/participate` (plan gate · live SSE monitor · evidence · audit callouts · 3D self-correction) | | `infra/` | — | Dockerfiles, compose, Cloud Run deploy | -**Config-flip backends** (env): `LLM_BACKEND` (`mock`\|`vertex`\|`gemini-enterprise`), `MONITOR_BACKEND` (`phoenix-local`\|`phoenix-cloud`\|`arize`), `CONSULTANT_BACKEND` (`table`\|`phoenix-mcp`\|`anchor`), `DOCSTORE_BACKEND` (`memory`\|`sqlite`\|`firestore`), `BLOB_BACKEND` (`local-fs`\|`gcs`), `AGENT_RUNTIME` (`python`\|`adk`). The `mock`/`memory`/`local-fs`/`noop` backends are complete, deterministic implementations — the whole engine runs and is tested with **zero credentials**. +**Config-flip backends** (env): `LLM_BACKEND` (`mock`|`vertex`|`gemini-enterprise`), `MONITOR_BACKEND` (`phoenix-local`|`phoenix-cloud`|`arize`), `CONSULTANT_BACKEND` (`table`|`phoenix-mcp`|`anchor`), `DOCSTORE_BACKEND` (`memory`|`sqlite`|`firestore`), `BLOB_BACKEND` (`local-fs`|`gcs`), `AGENT_RUNTIME` (`python`|`adk`). The `mock`/`memory`/`local-fs`/`noop` backends are complete, deterministic implementations — the whole engine runs and is tested with **zero credentials**. ## Reproduce diff --git a/pitch/brief.json b/pitch/brief.json index 9dd920a..4cfd996 100644 --- a/pitch/brief.json +++ b/pitch/brief.json @@ -36,7 +36,7 @@ "English-only on every artifact; no marketing superlatives.", "Hero 'Trace it. Trust it.' is sacred — verbatim at F4 primary, F11 echo, and close.", "Honesty rails: never 'un-gameable'; never '503 anchors'; hit@13 0.6154 is a binary Winner-label metric, NOT a rank curve (audit Delta = 0 on this golden set).", - "State which path is which: the public Cloud Run demo runs the deterministic spike-D prior with SCORING_MODE=legacy (AGENT_RUNTIME=python, parity-identical); the genuine live Arize AX results are the credentialed Agent Engine run.", + "State which path is which: the public Cloud Run demo runs the live Phoenix-MCP calibration loop (reads + writes the dataset over MCP per request) on the SCORING_MODE=legacy python path; the full nested trace tree + the hit@13 experiment are the credentialed Agent Engine run.", "Model is Gemini 3.1 Flash-Lite via Vertex AI — do not claim a different model.", "F4 hero holds >= 5s; every timestamp rolls up to 180s; SLIDE_DURATION sums to 180s of frame budget.", "Modifier-key guard 'if(e.metaKey||e.ctrlKey||e.altKey)return;' present; recording mode hides all chrome." @@ -55,7 +55,7 @@ "model": "Gemini 3.1 Flash-Lite via Vertex AI, orchestrated with Google ADK 2.0", "license": "Apache-2.0", "repo": "github.com/Two-Weeks-Team/glasshat", - "demo_path_caveat": "public Cloud Run demo = deterministic spike-D prior, SCORING_MODE=legacy, AGENT_RUNTIME=python (parity-identical RunRecord + SSE, gated default); genuine ADK Workflow + live AX results = credentialed Agent Engine run" + "demo_path_caveat": "public Cloud Run demo = live Phoenix-MCP calibration loop (read+write per request), SCORING_MODE=legacy python path; full nested trace tree + hit@13 experiment = credentialed Agent Engine run (spike-D = the dataset seed/fallback)" }, "_palette_override": { "note": "Schema color_palette enum lacks our custom dark scheme; oklch-warm-gold selected as nearest valid value, but the deck uses the locked dark-oklch cinematic palette below (purple->cyan->green, bright-cyan hero accent, NOT gold).", diff --git a/pitch/frames-spec.json b/pitch/frames-spec.json index 80c710f..2ffbfc1 100644 --- a/pitch/frames-spec.json +++ b/pitch/frames-spec.json @@ -350,7 +350,7 @@ { "left": "Project A #2", "right": "Project B #2" } ], "chip": "hit@13 0.6154 · Arize AX experiment", - "honest_chip": "binary Winner-label · not a rank curve · demo runs spike-D prior" + "honest_chip": "binary Winner-label · not a rank curve · audit Δ=0 on this set · live Phoenix-MCP loop" }, "animation_timeline": [ { "selector": "terminal-lines", "anim": "right-rows", "delay": 0.2 }, @@ -409,7 +409,7 @@ "Apache-2.0" ], "call": "github.com/Two-Weeks-Team/glasshat", - "honest_note": "hit@13 = binary Winner-label, not a rank curve · public demo runs the spike-D prior (SCORING_MODE=legacy); live AX results = credentialed Agent Engine run." + "honest_note": "hit@13 = binary Winner-label, not a rank curve · the public demo runs the live Phoenix-MCP calibration loop (SCORING_MODE=legacy python path); the full trace tree + the hit@13 experiment = credentialed Agent Engine run." } } ], diff --git a/pitch/scenario.md b/pitch/scenario.md index b438803..9abc70b 100644 --- a/pitch/scenario.md +++ b/pitch/scenario.md @@ -9,7 +9,7 @@ - Never say "un-gameable." The audit *raises the bar and makes the score observable* — that is the claim. - Never say "503 anchors." The calibration prior is the held-out spike-D prior; the golden labels are binary Winner badges. - `hit@13 = 0.6154` is a **binary Winner-label hit rate, not a rank curve.** On this golden set the audit did **not** reorder the top-13 (Δ = 0). -- Say which path is which: the **public Cloud Run demo** runs the deterministic **spike-D prior** with `SCORING_MODE=legacy` (`AGENT_RUNTIME=python`, parity-identical RunRecord + SSE). The **genuine live Arize AX results** are the **credentialed Agent Engine** run. +- Say which path is which: the **public Cloud Run demo** runs the **live Phoenix-MCP calibration loop** (reads + writes the `glasshat-calibration` dataset over MCP per request) on the `SCORING_MODE=legacy` python path. The **full nested trace tree + the hit@13 experiment** are the **credentialed Agent Engine** run. (spike-D = the dataset's seed/fallback.) - Model is **Gemini 3.1 Flash-Lite via Vertex AI**. Orchestrated with **Google ADK 2.0**. --- @@ -168,7 +168,7 @@ - ✓ Apache-2.0 **Call:** `github.com/Two-Weeks-Team/glasshat` -**Honest note (on screen):** *hit@13 = binary Winner-label, not a rank curve · public demo runs the spike-D prior (`SCORING_MODE=legacy`); live AX results = credentialed Agent Engine run.* +**Honest note (on screen):** *hit@13 = binary Winner-label, not a rank curve · the public demo runs the live Phoenix-MCP calibration loop (`SCORING_MODE=legacy` python path); the full trace tree + the hit@13 experiment = credentialed Agent Engine run.* **VO:** *(silent — hold 12s while the recorder cuts)*