[audit-workflows] Daily Audit — 2026-06-01: 96.2% success (14-day high); both failures = one safe-output target=* class #36352
Replies: 6 comments 1 reply
-
|
/q remove estimated cost from report in this agentic workflow |
Beta Was this translation helpful? Give feedback.
-
|
💥 WHOOSH! 🦸 The Smoke Test Agent ZOOMS in! 🌪️ KA-POW! All systems checked, gadgets tested, MCP servers BAM'd into shape! 💪 The Claude engine is running NOMINAL across the multiverse! 🚀 The Smoke Test Agent was here! ✨🦾 THWIP! Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
-
|
Smoke beast was here. Tiny sparks. Repo still stand. 🔥 Warning Firewall blocked 5 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
-
|
Smoke beast was here. Tiny sparks. Repo still stand. 🔥 Warning Firewall blocked 5 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
-
|
💥 KA-POW! 🦸 The Smoke Test Agent BURST through the firewall — WHOOSH! 🌪️ All systems checked, all gizmos GLEAMING! ⚡ The Claude engine roars to life... VROOOM! 🚀 "This repo is SAFE for another day!" 🛡️✨ THWIP! Until next time, citizens! 💨 Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #36398. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Audit of the last 24h of agentic workflow runs (window ending 2026-06-01 ~22:15 UTC): 53 completed runs, 51 success, 2 failure — 96.2% success, the best rate in the 14-day trend. Cost was flat and low ($15.62 claude-measured), and three previously-active failure classes (token-budget 429, threat-detection missing prompt, PR-branch-deleted race) all went quiet. The headline: both of today's failures are the same class — the recurring safe-output
target="*"partial-failure intolerance — which is now the dominant and only repeating failure mode.Summary
1 Only the claude engine reports
EstimatedCost; copilot/codex/gemini/pi/antigravity report 0, so total cost is claude-biased.Critical Issues — both failures are one class⚠️
Both red runs were the
safe-output-partial-failure-intoleranceclass: one invalidtarget="*"item (with no resolvable issue/PR number) red-fails the entiresafe_outputsjob even though sibling items succeeded.add_commentitem hadtarget="*"with noitem_numberon a schedule event →Message 3 (add_comment) failed→ whole job red despitecreate_issue+add_labelssucceeding. This is a recurrence on the same workflow (also failed 05-30).create_pull_request_review_commentitems hadtarget="*"with nopull_request_number→Messages 8,9 failed→ whole job red despite other smoke outputs succeeding. New workflow in this class. (The agent itself reported PARTIAL — all functional smoke tests passed.)This class now spans 6 workflows since 05-26. Root defect is unchanged: Process Safe Outputs fails the whole job on any failed item instead of skip-with-warning when ≥1 item succeeded.
Good news — three classes went quiet 🟢
Trend charts (last ~14 days)
Workflow Health
Success rate climbed to 96.2%, the highest in the tracked window, recovering well above the high-80s/low-90s band and far from the 05-23 dip (41.6%). Failure count dropped to 2, both from a single class rather than scattered systemic regressions.
Token Volume & Cost
Raw token volume (38.1M) and claude-measured cost ($15.62) are both at the low end of the window, pulling the 7-day moving average down after the 05-31 spike ($31.63, driven by the Go Logger $8.30 outlier). No heavy-tail cost anomaly this window.
Capability & network details
Missing tools (2 — both benign smoke probes)
mcpscripts-gh— Smoke Claude (test Add workflow: githubnext/agentics/weekly-research #2 probe; self-corrected withgithub_pr_query).web-fetch MCP— Smoke Codex (probe; unavailable by design).Neither is a real capability gap; both are intentional smoke-test probes. 0 MCP failures, 0 missing-data signals.
Firewall (19.3% blocked, up from 16.7%)
The uptick is driven by new-engine smoke tests added this window, not workflow regressions. Hotspots: Smoke Antigravity 10/12 (83%), Smoke Copilot 118/325 (36%), Linter Miner 63/243 (26%). Blocks are predominantly Google telemetry (
content-autofill,accounts.google,www.google), Playwrightazureedge,localhost, and(unknown)— by-design noise. No firewall block caused either failure.New-engine coverage 🆕
Three engines joined smoke coverage this window — Smoke Antigravity, Smoke Gemini, Smoke Pi — all passed.
Drift watch
PR Code Quality Reviewer turn count varied 3 → 22 (avg 13.8) across 3 successful runs — a 7× spread suggesting task-shape/prompt instability. No failures; monitoring.
Recommendations
target="*"at the MCP emit boundary — reject anadd_comment/create_pull_request_review_commentwhose triggering context can't resolve to a concrete number (e.g. schedule events) so the agent self-corrects in-loop. Prompt-only guardrails have now failed across 6 workflows.cpwith|| true; verify the prompt exists before invoking the detection agent.References:
Beta Was this translation helpful? Give feedback.
All reactions