Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ deck.pdf + repo URL + rubric source
| `apps/web` | `glasshat-web` | Next.js 16: landing + `/judge` (batch rank · tie-break · gate-2 override · lock) + `/participate` (plan gate · live SSE monitor · evidence · audit callouts · 3D self-correction) |
| `infra/` | — | Dockerfiles, compose, Cloud Run deploy |

**Config-flip backends** (env): `LLM_BACKEND` (`mock`\|`vertex`\|`gemini-enterprise`), `MONITOR_BACKEND` (`phoenix-local`\|`phoenix-cloud`\|`arize`), `CONSULTANT_BACKEND` (`table`\|`phoenix-mcp`\|`anchor`), `DOCSTORE_BACKEND` (`memory`\|`sqlite`\|`firestore`), `BLOB_BACKEND` (`local-fs`\|`gcs`), `AGENT_RUNTIME` (`python`\|`adk`). The `mock`/`memory`/`local-fs`/`noop` backends are complete, deterministic implementations — the whole engine runs and is tested with **zero credentials**.
**Config-flip backends** (env): `LLM_BACKEND` (`mock`|`vertex`|`gemini-enterprise`), `MONITOR_BACKEND` (`phoenix-local`|`phoenix-cloud`|`arize`), `CONSULTANT_BACKEND` (`table`|`phoenix-mcp`|`anchor`), `DOCSTORE_BACKEND` (`memory`|`sqlite`|`firestore`), `BLOB_BACKEND` (`local-fs`|`gcs`), `AGENT_RUNTIME` (`python`|`adk`). The `mock`/`memory`/`local-fs`/`noop` backends are complete, deterministic implementations — the whole engine runs and is tested with **zero credentials**.

## Reproduce

Expand Down
4 changes: 2 additions & 2 deletions pitch/brief.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"English-only on every artifact; no marketing superlatives.",
"Hero 'Trace it. Trust it.' is sacred — verbatim at F4 primary, F11 echo, and close.",
"Honesty rails: never 'un-gameable'; never '503 anchors'; hit@13 0.6154 is a binary Winner-label metric, NOT a rank curve (audit Delta = 0 on this golden set).",
"State which path is which: the public Cloud Run demo runs the deterministic spike-D prior with SCORING_MODE=legacy (AGENT_RUNTIME=python, parity-identical); the genuine live Arize AX results are the credentialed Agent Engine run.",
"State which path is which: the public Cloud Run demo runs the live Phoenix-MCP calibration loop (reads + writes the dataset over MCP per request) on the SCORING_MODE=legacy python path; the full nested trace tree + the hit@13 experiment are the credentialed Agent Engine run.",
"Model is Gemini 3.1 Flash-Lite via Vertex AI — do not claim a different model.",
"F4 hero holds >= 5s; every timestamp rolls up to 180s; SLIDE_DURATION sums to 180s of frame budget.",
"Modifier-key guard 'if(e.metaKey||e.ctrlKey||e.altKey)return;' present; recording mode hides all chrome."
Expand All @@ -55,7 +55,7 @@
"model": "Gemini 3.1 Flash-Lite via Vertex AI, orchestrated with Google ADK 2.0",
"license": "Apache-2.0",
"repo": "github.com/Two-Weeks-Team/glasshat",
"demo_path_caveat": "public Cloud Run demo = deterministic spike-D prior, SCORING_MODE=legacy, AGENT_RUNTIME=python (parity-identical RunRecord + SSE, gated default); genuine ADK Workflow + live AX results = credentialed Agent Engine run"
"demo_path_caveat": "public Cloud Run demo = live Phoenix-MCP calibration loop (read+write per request), SCORING_MODE=legacy python path; full nested trace tree + hit@13 experiment = credentialed Agent Engine run (spike-D = the dataset seed/fallback)"
},
"_palette_override": {
"note": "Schema color_palette enum lacks our custom dark scheme; oklch-warm-gold selected as nearest valid value, but the deck uses the locked dark-oklch cinematic palette below (purple->cyan->green, bright-cyan hero accent, NOT gold).",
Expand Down
4 changes: 2 additions & 2 deletions pitch/frames-spec.json
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@
{ "left": "Project A #2", "right": "Project B #2" }
],
"chip": "hit@13 0.6154 · Arize AX experiment",
"honest_chip": "binary Winner-label · not a rank curve · demo runs spike-D prior"
"honest_chip": "binary Winner-label · not a rank curve · audit Δ=0 on this set · live Phoenix-MCP loop"
},
"animation_timeline": [
{ "selector": "terminal-lines", "anim": "right-rows", "delay": 0.2 },
Expand Down Expand Up @@ -409,7 +409,7 @@
"Apache-2.0"
],
"call": "github.com/Two-Weeks-Team/glasshat",
"honest_note": "hit@13 = binary Winner-label, not a rank curve · public demo runs the spike-D prior (SCORING_MODE=legacy); live AX results = credentialed Agent Engine run."
"honest_note": "hit@13 = binary Winner-label, not a rank curve · the public demo runs the live Phoenix-MCP calibration loop (SCORING_MODE=legacy python path); the full trace tree + the hit@13 experiment = credentialed Agent Engine run."
}
}
],
Expand Down
4 changes: 2 additions & 2 deletions pitch/scenario.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- Never say "un-gameable." The audit *raises the bar and makes the score observable* — that is the claim.
- Never say "503 anchors." The calibration prior is the held-out spike-D prior; the golden labels are binary Winner badges.
- `hit@13 = 0.6154` is a **binary Winner-label hit rate, not a rank curve.** On this golden set the audit did **not** reorder the top-13 (Δ = 0).
- Say which path is which: the **public Cloud Run demo** runs the deterministic **spike-D prior** with `SCORING_MODE=legacy` (`AGENT_RUNTIME=python`, parity-identical RunRecord + SSE). The **genuine live Arize AX results** are the **credentialed Agent Engine** run.
- Say which path is which: the **public Cloud Run demo** runs the **live Phoenix-MCP calibration loop** (reads + writes the `glasshat-calibration` dataset over MCP per request) on the `SCORING_MODE=legacy` python path. The **full nested trace tree + the hit@13 experiment** are the **credentialed Agent Engine** run. (spike-D = the dataset's seed/fallback.)
- Model is **Gemini 3.1 Flash-Lite via Vertex AI**. Orchestrated with **Google ADK 2.0**.

---
Expand Down Expand Up @@ -168,7 +168,7 @@
- ✓ Apache-2.0

**Call:** `github.com/Two-Weeks-Team/glasshat`
**Honest note (on screen):** *hit@13 = binary Winner-label, not a rank curve · public demo runs the spike-D prior (`SCORING_MODE=legacy`); live AX results = credentialed Agent Engine run.*
**Honest note (on screen):** *hit@13 = binary Winner-label, not a rank curve · the public demo runs the live Phoenix-MCP calibration loop (`SCORING_MODE=legacy` python path); the full trace tree + the hit@13 experiment = credentialed Agent Engine run.*

**VO:** *(silent — hold 12s while the recorder cuts)*

Expand Down
Loading