[audit-workflows] Daily Agentic Workflow Audit — 2026-06-02 (84.2% success, 9 failures / 3 classes) #36398
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #36500. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Daily Agentic Workflow Audit — 2026-06-02
Audited 57 completed runs from the last 24 hours (plus 6 in-progress, including this audit). Success dropped to 84.2% (48/57) — down sharply from yesterday's 96.2% peak. The drop is driven by three concurrent failure classes, not one: a re-escalated token-budget 429, the persistent safe-output partial-failure defect, and a new experimental-SDK auth failure. Notably, 4 of the 9 failures are a single experimental dev-iteration workflow (Smoke Copilot SDK); excluding it, effective success is ~90.6% (48/53).
Summary
Critical Issues
🔴 1. token-budget-429 RE-ESCALATED (HIGH, #35661) — 2 workflows over the 25M cap. After one quiet window, the effective-token cap returned and hit two workflows:
26796384191—CAPIError: 429 Maximum effective tokens exceeded (25,666,412 / 25,000,000), retried 5× (~83–92s each), agent exit 1. Recurrence on the same workflow that 429'd on 05-31.26788943047— reached 14/16 smoke checks, then429 ... (25,499,262 / 25,000,000). New workflow in this class.🔴 2. safe-output partial-failure-intolerance PERSISTS (HIGH) — 3 workflows. In every case the agent succeeded (exit 0) but one failed safe-output item red-failed the whole
safe_outputsjob:26796998912—update_issuewithtarget="triggering"on a schedule event (not in issue context). Recurrence.26788943036—set_issue_field→No issue number available.26794554270—update_pull_request #36353→ERR_API ... branch from base failed(transient API-error variant).🟡 3. NEW: copilot-sdk session-auth (dev-iteration, WATCH). Smoke Copilot SDK failed 4× at agent T0 (
26792092514,26793757607,26795908399,26797319253) on experimental branchcopilot/fix-copilot-sdk-integration—Error: Session was not created with authentication info or custom providerfrom the copilot-sdksdk-driveron "sending prompt". Four distinct SHAs over ~3h = active development iteration, not a production regression. The headless server logsNo COPILOT_CONNECTION_TOKEN was set, pointing at the missing auth path.Trend Charts
Success rate fell from the 96.2% peak to 84.2% — a single-day dip, not a sustained slide. The 15-day band sits mostly in the high-80s/low-90s; today's drop is fully explained by the three failure clusters above, and the 4-run experimental SDK workflow accounts for nearly half of it.
Daily tokens (32.7M) came in below the 7-day moving average, continuing the post-05-31 cooldown. Despite lower aggregate usage, two individual workflows still punched through the 25M effective-token per-run cap — the budget problem is concentrated in a few heavy aggregators, not broad token growth.
What went right this window
SUCCESS] — the safe-outputtarget=*class did not recur here (it failed 05-30 and 06-01). Improvement.create_pull_request_review_commenttarget=* failures did not repeat.Cost & token detail
EstimatedCost=0, so totals are claude-biased.Firewall (17.6%, no failure-causing blocks)
Block rate 17.6% (698/3936), down from 19.3%. All hotspots are by-design smoke probes: Smoke Copilot 56/136 (41%), Smoke Codex 38/114 (33%), Smoke Gemini 25/78 (32%), Smoke Claude 29/97 (30%) — Google telemetry, unknown-SNI, Playwright azureedge, localhost. No firewall block caused any of the 9 failures. No action.
Recommendations
isMaxEffectiveTokensExceededErrorsignature instead of retrying 5× — a hard cap can't be recovered by--continue, and the retries burned ~7–8 min per failed run today.target="*"/target="triggering"at the MCP emit boundary on schedule/non-issue/non-PR events so the agent self-corrects in-loop. This would have turned all 3 of today's safe-output failures green-with-warning.COPILOT_CONNECTION_TOKENor wire the custom provider), so dispatch-iteration noise stops polluting failure metrics.References:
Beta Was this translation helpful? Give feedback.
All reactions