Two-Weeks-Team · ComBba · Jun 6, 2026 · Jun 6, 2026
diff --git a/README.md b/README.md
@@ -136,7 +136,7 @@ deck.pdf + repo URL + rubric source
 | `apps/web` | `glasshat-web` | Next.js 16: landing + `/judge` (batch rank · tie-break · gate-2 override · lock) + `/participate` (plan gate · live SSE monitor · evidence · audit callouts · 3D self-correction) |
 | `infra/` | — | Dockerfiles, compose, Cloud Run deploy |
 
-**Config-flip backends** (env): `LLM_BACKEND` (`mock`\|`vertex`\|`gemini-enterprise`), `MONITOR_BACKEND` (`phoenix-local`\|`phoenix-cloud`\|`arize`), `CONSULTANT_BACKEND` (`table`\|`phoenix-mcp`\|`anchor`), `DOCSTORE_BACKEND` (`memory`\|`sqlite`\|`firestore`), `BLOB_BACKEND` (`local-fs`\|`gcs`), `AGENT_RUNTIME` (`python`\|`adk`). The `mock`/`memory`/`local-fs`/`noop` backends are complete, deterministic implementations — the whole engine runs and is tested with **zero credentials**.
+**Config-flip backends** (env): `LLM_BACKEND` (`mock`|`vertex`|`gemini-enterprise`), `MONITOR_BACKEND` (`phoenix-local`|`phoenix-cloud`|`arize`), `CONSULTANT_BACKEND` (`table`|`phoenix-mcp`|`anchor`), `DOCSTORE_BACKEND` (`memory`|`sqlite`|`firestore`), `BLOB_BACKEND` (`local-fs`|`gcs`), `AGENT_RUNTIME` (`python`|`adk`). The `mock`/`memory`/`local-fs`/`noop` backends are complete, deterministic implementations — the whole engine runs and is tested with **zero credentials**.
 
 ## Reproduce
 

diff --git a/pitch/brief.json b/pitch/brief.json
@@ -36,7 +36,7 @@
     "English-only on every artifact; no marketing superlatives.",
     "Hero 'Trace it. Trust it.' is sacred — verbatim at F4 primary, F11 echo, and close.",
     "Honesty rails: never 'un-gameable'; never '503 anchors'; hit@13 0.6154 is a binary Winner-label metric, NOT a rank curve (audit Delta = 0 on this golden set).",
-    "State which path is which: the public Cloud Run demo runs the deterministic spike-D prior with SCORING_MODE=legacy (AGENT_RUNTIME=python, parity-identical); the genuine live Arize AX results are the credentialed Agent Engine run.",
+    "State which path is which: the public Cloud Run demo runs the live Phoenix-MCP calibration loop (reads + writes the dataset over MCP per request) on the SCORING_MODE=legacy python path; the full nested trace tree + the hit@13 experiment are the credentialed Agent Engine run.",
     "Model is Gemini 3.1 Flash-Lite via Vertex AI — do not claim a different model.",
     "F4 hero holds >= 5s; every timestamp rolls up to 180s; SLIDE_DURATION sums to 180s of frame budget.",
     "Modifier-key guard 'if(e.metaKey||e.ctrlKey||e.altKey)return;' present; recording mode hides all chrome."
@@ -55,7 +55,7 @@
     "model": "Gemini 3.1 Flash-Lite via Vertex AI, orchestrated with Google ADK 2.0",
     "license": "Apache-2.0",
     "repo": "github.com/Two-Weeks-Team/glasshat",
-    "demo_path_caveat": "public Cloud Run demo = deterministic spike-D prior, SCORING_MODE=legacy, AGENT_RUNTIME=python (parity-identical RunRecord + SSE, gated default); genuine ADK Workflow + live AX results = credentialed Agent Engine run"
+    "demo_path_caveat": "public Cloud Run demo = live Phoenix-MCP calibration loop (read+write per request), SCORING_MODE=legacy python path; full nested trace tree + hit@13 experiment = credentialed Agent Engine run (spike-D = the dataset seed/fallback)"
   },
   "_palette_override": {
     "note": "Schema color_palette enum lacks our custom dark scheme; oklch-warm-gold selected as nearest valid value, but the deck uses the locked dark-oklch cinematic palette below (purple->cyan->green, bright-cyan hero accent, NOT gold).",

diff --git a/pitch/frames-spec.json b/pitch/frames-spec.json
@@ -350,7 +350,7 @@
           { "left": "Project A  #2", "right": "Project B  #2" }
         ],
         "chip": "hit@13 0.6154 · Arize AX experiment",
-        "honest_chip": "binary Winner-label · not a rank curve · demo runs spike-D prior"
+        "honest_chip": "binary Winner-label · not a rank curve · audit Δ=0 on this set · live Phoenix-MCP loop"
       },
       "animation_timeline": [
         { "selector": "terminal-lines", "anim": "right-rows", "delay": 0.2 },
@@ -409,7 +409,7 @@
           "Apache-2.0"
         ],
         "call": "github.com/Two-Weeks-Team/glasshat",
-        "honest_note": "hit@13 = binary Winner-label, not a rank curve · public demo runs the spike-D prior (SCORING_MODE=legacy); live AX results = credentialed Agent Engine run."
+        "honest_note": "hit@13 = binary Winner-label, not a rank curve · the public demo runs the live Phoenix-MCP calibration loop (SCORING_MODE=legacy python path); the full trace tree + the hit@13 experiment = credentialed Agent Engine run."
       }
     }
   ],

diff --git a/pitch/scenario.md b/pitch/scenario.md
@@ -9,7 +9,7 @@
 - Never say "un-gameable." The audit *raises the bar and makes the score observable* — that is the claim.
 - Never say "503 anchors." The calibration prior is the held-out spike-D prior; the golden labels are binary Winner badges.
 - `hit@13 = 0.6154` is a **binary Winner-label hit rate, not a rank curve.** On this golden set the audit did **not** reorder the top-13 (Δ = 0).
-- Say which path is which: the **public Cloud Run demo** runs the deterministic **spike-D prior** with `SCORING_MODE=legacy` (`AGENT_RUNTIME=python`, parity-identical RunRecord + SSE). The **genuine live Arize AX results** are the **credentialed Agent Engine** run.
+- Say which path is which: the **public Cloud Run demo** runs the **live Phoenix-MCP calibration loop** (reads + writes the `glasshat-calibration` dataset over MCP per request) on the `SCORING_MODE=legacy` python path. The **full nested trace tree + the hit@13 experiment** are the **credentialed Agent Engine** run. (spike-D = the dataset's seed/fallback.)
 - Model is **Gemini 3.1 Flash-Lite via Vertex AI**. Orchestrated with **Google ADK 2.0**.
 
 ---
@@ -168,7 +168,7 @@
 - ✓ Apache-2.0
 
 **Call:** `github.com/Two-Weeks-Team/glasshat`
-**Honest note (on screen):** *hit@13 = binary Winner-label, not a rank curve · public demo runs the spike-D prior (`SCORING_MODE=legacy`); live AX results = credentialed Agent Engine run.*
+**Honest note (on screen):** *hit@13 = binary Winner-label, not a rank curve · the public demo runs the live Phoenix-MCP calibration loop (`SCORING_MODE=legacy` python path); the full trace tree + the hit@13 experiment = credentialed Agent Engine run.*
 
 **VO:** *(silent — hold 12s while the recorder cuts)*