docs: reflect the now-live Phoenix-MCP loop in compliance + evidence docs#86
Conversation
Caught while auditing whether everything was updated: the judge-facing compliance doc still understated the Phoenix-MCP path. - rapid-agent-compliance.md: status Wired -> Live; section 3 reframed so the DEPLOYED path is the live PhoenixMcpConsultant (read + write-back over MCP per request, Cloud-SQL-backed Phoenix) and the TableConsultant is the no-endpoint fallback. - evidence/rapid-agent-visual-proof-2026-05-24.md: added a dated Superseded banner (kept the historical amber/wired content intact rather than rewriting the record). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Complex PR? Review this PR in Change Stack to move by importance, not file order. Caution Review failedPull request was closed or merged during review 개요이 풀 리퀘스트는 Phoenix MCP 실시간 감시 루프의 현재 상태를 반영하도록 증거 및 규정 준수 문서를 업데이트합니다. 과거의 amber/wired 상태 설명을 superseded 주석으로 표시하고, 배포된 감시가 live per-request MCP 라운드-트립을 수행하는 것을 명시적으로 기술합니다. 변경 사항Phoenix MCP 라이브 감시 문서화
예상 코드 리뷰 노력🎯 1 (사소함) | ⏱️ ~8분 연관된 풀 리퀘스트
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the documentation to reflect that the deployed audit now runs a live Phoenix-MCP loop (with read and write-back capabilities against a Cloud-SQL-backed Phoenix on Cloud Run) instead of using a static table prior. The feedback points out that the documentation still references @latest for the Phoenix MCP package instead of the pinned version @4.0.13 used in the code, and contains outdated line numbers for the referenced functions.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| | 2 | **Code-owned agent runtime** (rules name "Agent Builder"; the **Arize track** requires a code-owned runtime — *Gemini CLI / Agent Platform SDK / **Google ADK** / Agent Runtime / **Cloud Run***, and states **"Visual Agent Builder alone is insufficient. Direct code instrumentation is required."**) | **Google ADK** orchestrator, OpenInference-instrumented, deployed on **Cloud Run**. No visual Agent Builder app — that path is *explicitly disallowed* for this track. See §2. | `services/pipeline-orchestrator/src/glasshat/pipeline/adk_runtime.py` (`instrument_adk`, `run_via_adk`); engine `…/pipeline/engine.py`; deploy `infra/deploy.sh` | §2 below + `claudedocs/hackathon-source-2026-05-21/03-arize-resources.md` (the rule, quoted) | ✅ Resolved | | ||
| | 3 | **Arize partner integration** (OpenInference tracing → Arize/Phoenix) | OpenInference auto-instrumentation → **Arize AX** at `otlp.arize.com`; **one span per agent** (`RubricSynthesizer · BluePlanner · SixHatPanel · Audit · BMADScorer · ReportAssembler`) + per-hat `hat_assess`, all carrying `glasshat.*` attributes | `packages/shared/src/glasshat/shared/tracing.py` → `ArizeTracer` (registers via `arize.otel`, line 68); span sites `…/pipeline/engine.py:115–149` | `uv run python scripts/real_arize_ax_e2e.py`; live run `2b2e29c2` (final 56.93, 4 self-corrections) | ✅ Live | | ||
| | 4 | **Partner MCP server** (Phoenix MCP — required by the track) | ADK **`MCPToolset` over stdio** → `npx @arizeai/phoenix-mcp@latest`. The audit's calibration consultant calls the Phoenix MCP **`get-dataset-examples`** tool, parses per-anchor score deltas, and feeds them into the self-correction. See §3. | `…/pipeline/adk_runtime.py` → `build_phoenix_mcp_toolset` (l.31), `PhoenixMcpConsultant.consult` (l.53–96, tool `get-dataset-examples` l.82) | `uv run python scripts/real_e2e.py` (real ADK → Phoenix MCP stdio → pipeline) | ✅ Wired — exercised by e2e (see §3 on deployed vs. live-trace path) | | ||
| | 4 | **Partner MCP server** (Phoenix MCP — required by the track) | ADK **`MCPToolset` over stdio** → `npx @arizeai/phoenix-mcp@latest`. The audit's calibration consultant calls the Phoenix MCP **`get-dataset-examples`** tool, parses per-anchor score deltas, and feeds them into the self-correction. See §3. | `…/pipeline/adk_runtime.py` → `build_phoenix_mcp_toolset` (l.31), `PhoenixMcpConsultant.consult` (l.53–96, tool `get-dataset-examples` l.82) | `uv run python scripts/real_e2e.py` (real ADK → Phoenix MCP stdio → pipeline) | ✅ **Live** — the deployed audit does a per-request MCP round-trip (read `get-dataset-examples` + write-back `add-dataset-examples`) against a Cloud-SQL-backed Phoenix on Cloud Run (`PHOENIX_COLLECTOR_ENDPOINT` set). See §3. | |
There was a problem hiding this comment.
There are two inconsistencies in this row compared to the actual implementation in services/pipeline-orchestrator/src/glasshat/pipeline/adk_runtime.py:
- Pinned Version: The documentation mentions
npx @arizeai/phoenix-mcp@latest, butadk_runtime.pypins the package to@4.0.13(_PHOENIX_MCP_PACKAGE = "@arizeai/phoenix-mcp@4.0.13") for supply-chain hardening. - Outdated Line Numbers: The referenced line numbers for
build_phoenix_mcp_toolset(l.31),PhoenixMcpConsultant.consult(l.53–96), andget-dataset-examples(l.82) are outdated. They should be updated tol.102,l.158, andl.147respectively to match the current codebase.
Here is the suggested replacement:
| 4 | **Partner MCP server** (Phoenix MCP — required by the track) | ADK **`MCPToolset` over stdio** → `npx @arizeai/phoenix-mcp@4.0.13`. The audit's calibration consultant calls the Phoenix MCP **`get-dataset-examples`** tool, parses per-anchor score deltas, and feeds them into the self-correction. See §3. | `…/pipeline/adk_runtime.py` → `build_phoenix_mcp_toolset` (l.102), `PhoenixMcpConsultant.consult` (l.158, tool `get-dataset-examples` l.147) | `uv run python scripts/real_e2e.py` (real ADK → Phoenix MCP stdio → pipeline) | ✅ **Live** — the deployed audit does a per-request MCP round-trip (read `get-dataset-examples` + write-back `add-dataset-examples`) against a Cloud-SQL-backed Phoenix on Cloud Run (`PHOENIX_COLLECTOR_ENDPOINT` set). See §3. |
Follow-up audit (prompted by 'did you fix everything?'): the primary surfaces (web chips, README table, Devpost) were flipped to live in #84, but two judge-facing docs still understated/contradicted it.
Wired->Live; section 3 reframed so the DEPLOYED path is the live PhoenixMcpConsultant (per-request read + write-back over MCP, Cloud-SQL-backed Phoenix) and TableConsultant is the no-endpoint fallback.Docs-only.
Summary by CodeRabbit
릴리스 노트