docs: reflect the now-live Phoenix-MCP loop in compliance + evidence docs by ComBba · Pull Request #86 · Two-Weeks-Team/glasshat

ComBba · 2026-06-06T08:07:43Z

Follow-up audit (prompted by 'did you fix everything?'): the primary surfaces (web chips, README table, Devpost) were flipped to live in #84, but two judge-facing docs still understated/contradicted it.

rapid-agent-compliance.md: Phoenix-MCP status Wired -> Live; section 3 reframed so the DEPLOYED path is the live PhoenixMcpConsultant (per-request read + write-back over MCP, Cloud-SQL-backed Phoenix) and TableConsultant is the no-endpoint fallback.
evidence/rapid-agent-visual-proof-2026-05-24.md: dated Superseded banner (kept the historical amber/wired content intact — it was accurate on that date).

Docs-only.

Summary by CodeRabbit

릴리스 노트

Documentation
- 문서에 최신 상태 정보를 반영하여 업데이트했습니다.
- 감사 현황 및 시스템 상태 변경 사항을 문서에 추가했습니다.
- 기술 요구사항 및 시스템 흐름도를 최신 배포 정보로 수정했습니다.

Caught while auditing whether everything was updated: the judge-facing compliance doc still understated the Phoenix-MCP path. - rapid-agent-compliance.md: status Wired -> Live; section 3 reframed so the DEPLOYED path is the live PhoenixMcpConsultant (read + write-back over MCP per request, Cloud-SQL-backed Phoenix) and the TableConsultant is the no-endpoint fallback. - evidence/rapid-agent-visual-proof-2026-05-24.md: added a dated Superseded banner (kept the historical amber/wired content intact rather than rewriting the record). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector · 2026-06-06T08:07:50Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

coderabbitai · 2026-06-06T08:07:54Z

Complex PR? Review this PR in Change Stack to move by importance, not file order.

Caution

Review failed

Pull request was closed or merged during review

개요

이 풀 리퀘스트는 Phoenix MCP 실시간 감시 루프의 현재 상태를 반영하도록 증거 및 규정 준수 문서를 업데이트합니다. 과거의 amber/wired 상태 설명을 superseded 주석으로 표시하고, 배포된 감시가 live per-request MCP 라운드-트립을 수행하는 것을 명시적으로 기술합니다.

변경 사항

Phoenix MCP 라이브 감시 문서화

계층 / 파일	요약
과거 증거 상태 주석 `docs/evidence/rapid-agent-visual-proof-2026-05-24.md`	2026-06-06 superseded 주석을 문서 상단에 추가해, 과거 Phoenix MCP 칩 상태 설명(amber/wired/table prior)이 변경되었음을 명시하고, 현재 배포된 감사가 live Phoenix-MCP 루프를 수행하며 5개 칩이 모두 green/live 상태임을 기술합니다.
라이브 감사 규정 준수 설명 `docs/rapid-agent-compliance.md`	요구사항 `#4`(Partner MCP server)의 상태를 "Wired"에서 "Live"로 변경하고, 배포된 감사가 Cloud SQL 지원 Phoenix MCP에 대해 요청마다 live MCP 라운드-트립을 수행함을 명시합니다. §3의 아키텍처 다이어그램도 갱신해 "DEPLOYED" 경로의 `PhoenixMcpConsultant`(MCP stdio 상호작용 및 `add-dataset-examples` write-back 루프)와 "FALLBACK" 경로의 `TableConsultant`(Phoenix 엔드포인트 미설정 시) 분기 로직을 설명합니다.

예상 코드 리뷰 노력

🎯 1 (사소함) | ⏱️ ~8분

연관된 풀 리퀘스트

Two-Weeks-Team/glasshat#39: Phoenix MCP 라이브 감시 루프와 증거 결과에 대한 규정 준수 설명을 업데이트하는 주요 PR로, 동일 파일에서 PR #39에 추가된 Phoenix MCP 에이전트-루프 계약을 직접 기반으로 구축합니다.

🐰 Phoenix 칩이 반짝반짝 green으로 빛나네,
Live 루프 도는 감사 루프의 춤,
과거의 amber는 물러나고,
문서는 최신 상태 담아내네!
✨ 진실된 증거와 함께 나아가네!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	PR 제목은 변경 사항의 핵심을 정확하게 반영하고 있습니다. 'Phoenix-MCP loop'가 이제 '실시간(live)'으로 작동함을 명시하며, 영향받는 문서들('compliance + evidence docs')을 구체적으로 언급하고 있습니다.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/phoenix-mcp-live-compliance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the documentation to reflect that the deployed audit now runs a live Phoenix-MCP loop (with read and write-back capabilities against a Cloud-SQL-backed Phoenix on Cloud Run) instead of using a static table prior. The feedback points out that the documentation still references @latest for the Phoenix MCP package instead of the pinned version @4.0.13 used in the code, and contains outdated line numbers for the referenced functions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-06T08:08:50Z

 | 2 | **Code-owned agent runtime** (rules name "Agent Builder"; the **Arize track** requires a code-owned runtime — *Gemini CLI / Agent Platform SDK / **Google ADK** / Agent Runtime / **Cloud Run***, and states **"Visual Agent Builder alone is insufficient. Direct code instrumentation is required."**) | **Google ADK** orchestrator, OpenInference-instrumented, deployed on **Cloud Run**. No visual Agent Builder app — that path is *explicitly disallowed* for this track. See §2. | `services/pipeline-orchestrator/src/glasshat/pipeline/adk_runtime.py` (`instrument_adk`, `run_via_adk`); engine `…/pipeline/engine.py`; deploy `infra/deploy.sh` | §2 below + `claudedocs/hackathon-source-2026-05-21/03-arize-resources.md` (the rule, quoted) | ✅ Resolved |
 | 3 | **Arize partner integration** (OpenInference tracing → Arize/Phoenix) | OpenInference auto-instrumentation → **Arize AX** at `otlp.arize.com`; **one span per agent** (`RubricSynthesizer · BluePlanner · SixHatPanel · Audit · BMADScorer · ReportAssembler`) + per-hat `hat_assess`, all carrying `glasshat.*` attributes | `packages/shared/src/glasshat/shared/tracing.py` → `ArizeTracer` (registers via `arize.otel`, line 68); span sites `…/pipeline/engine.py:115–149` | `uv run python scripts/real_arize_ax_e2e.py`; live run `2b2e29c2` (final 56.93, 4 self-corrections) | ✅ Live |
-| 4 | **Partner MCP server** (Phoenix MCP — required by the track) | ADK **`MCPToolset` over stdio** → `npx @arizeai/phoenix-mcp@latest`. The audit's calibration consultant calls the Phoenix MCP **`get-dataset-examples`** tool, parses per-anchor score deltas, and feeds them into the self-correction. See §3. | `…/pipeline/adk_runtime.py` → `build_phoenix_mcp_toolset` (l.31), `PhoenixMcpConsultant.consult` (l.53–96, tool `get-dataset-examples` l.82) | `uv run python scripts/real_e2e.py` (real ADK → Phoenix MCP stdio → pipeline) | ✅ Wired — exercised by e2e (see §3 on deployed vs. live-trace path) |
+| 4 | **Partner MCP server** (Phoenix MCP — required by the track) | ADK **`MCPToolset` over stdio** → `npx @arizeai/phoenix-mcp@latest`. The audit's calibration consultant calls the Phoenix MCP **`get-dataset-examples`** tool, parses per-anchor score deltas, and feeds them into the self-correction. See §3. | `…/pipeline/adk_runtime.py` → `build_phoenix_mcp_toolset` (l.31), `PhoenixMcpConsultant.consult` (l.53–96, tool `get-dataset-examples` l.82) | `uv run python scripts/real_e2e.py` (real ADK → Phoenix MCP stdio → pipeline) | ✅ **Live** — the deployed audit does a per-request MCP round-trip (read `get-dataset-examples` + write-back `add-dataset-examples`) against a Cloud-SQL-backed Phoenix on Cloud Run (`PHOENIX_COLLECTOR_ENDPOINT` set). See §3. |


There are two inconsistencies in this row compared to the actual implementation in services/pipeline-orchestrator/src/glasshat/pipeline/adk_runtime.py:

Pinned Version: The documentation mentions npx @arizeai/phoenix-mcp@latest, but adk_runtime.py pins the package to @4.0.13 (_PHOENIX_MCP_PACKAGE = "@arizeai/phoenix-mcp@4.0.13") for supply-chain hardening.

Outdated Line Numbers: The referenced line numbers for build_phoenix_mcp_toolset (l.31), PhoenixMcpConsultant.consult (l.53–96), and get-dataset-examples (l.82) are outdated. They should be updated to l.102, l.158, and l.147 respectively to match the current codebase.

Here is the suggested replacement:

| 4 | **Partner MCP server** (Phoenix MCP — required by the track) | ADK **`MCPToolset` over stdio** → `npx @arizeai/phoenix-mcp@4.0.13`. The audit's calibration consultant calls the Phoenix MCP **`get-dataset-examples`** tool, parses per-anchor score deltas, and feeds them into the self-correction. See §3. | `…/pipeline/adk_runtime.py` → `build_phoenix_mcp_toolset` (l.102), `PhoenixMcpConsultant.consult` (l.158, tool `get-dataset-examples` l.147) | `uv run python scripts/real_e2e.py` (real ADK → Phoenix MCP stdio → pipeline) | ✅ **Live** — the deployed audit does a per-request MCP round-trip (read `get-dataset-examples` + write-back `add-dataset-examples`) against a Cloud-SQL-backed Phoenix on Cloud Run (`PHOENIX_COLLECTOR_ENDPOINT` set). See §3. |

gemini-code-assist Bot reviewed Jun 6, 2026

View reviewed changes

ComBba merged commit b69b211 into main Jun 6, 2026
4 of 5 checks passed

ComBba deleted the docs/phoenix-mcp-live-compliance branch June 6, 2026 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: reflect the now-live Phoenix-MCP loop in compliance + evidence docs#86

docs: reflect the now-live Phoenix-MCP loop in compliance + evidence docs#86
ComBba merged 1 commit into
mainfrom
docs/phoenix-mcp-live-compliance

ComBba commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

chatgpt-codex-connector Bot commented Jun 6, 2026

Uh oh!

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

Review failed

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ComBba commented Jun 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

릴리스 노트

Uh oh!

chatgpt-codex-connector Bot commented Jun 6, 2026

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

개요

변경 사항

예상 코드 리뷰 노력

연관된 풀 리퀘스트

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ComBba commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading