[audit-workflows] Daily Agentic Workflow Audit — 2026-06-02 (84.2% success, 9 failures / 3 classes) #36398

2026-06-02T04:29:08Z

github-actions[bot]
Bot Jun 2, 2026

Daily Agentic Workflow Audit — 2026-06-02

Audited 57 completed runs from the last 24 hours (plus 6 in-progress, including this audit). Success dropped to 84.2% (48/57) — down sharply from yesterday's 96.2% peak. The drop is driven by three concurrent failure classes, not one: a re-escalated token-budget 429, the persistent safe-output partial-failure defect, and a new experimental-SDK auth failure. Notably, 4 of the 9 failures are a single experimental dev-iteration workflow (Smoke Copilot SDK); excluding it, effective success is ~90.6% (48/53).

Summary

Metric	Value	vs 06-01
Completed runs	57	+4
Success rate	84.2% (48/57)	▼ from 96.2%
Failures	9 (3 classes)	▲ from 2
Total tokens	32.7M	≈
Effective tokens	243.6M	▼
Cost (claude-measured)	$15.05	≈
Turns / Action-min	712 / 428	≈
Firewall blocked	17.6% (698/3936)	▼ from 19.3%
Missing tools / MCP failures	0 / 0	≈
Engines	copilot 42, claude 10, codex 5, antigravity 2, gemini 2, pi 2	—

Critical Issues

🔴 1. token-budget-429 RE-ESCALATED (HIGH, #35661) — 2 workflows over the 25M cap. After one quiet window, the effective-token cap returned and hit two workflows:

Daily Firewall Logs Collector 26796384191 — CAPIError: 429 Maximum effective tokens exceeded (25,666,412 / 25,000,000), retried 5× (~83–92s each), agent exit 1. Recurrence on the same workflow that 429'd on 05-31.
Smoke Copilot 26788943047 — reached 14/16 smoke checks, then 429 ... (25,499,262 / 25,000,000). New workflow in this class.

🔴 2. safe-output partial-failure-intolerance PERSISTS (HIGH) — 3 workflows. In every case the agent succeeded (exit 0) but one failed safe-output item red-failed the whole safe_outputs job:

LintMonster 26796998912 — update_issue with target="triggering" on a schedule event (not in issue context). Recurrence.
Smoke Codex 26788943036 — set_issue_field → No issue number available.
PR Sous Chef 26794554270 — update_pull_request #36353 → ERR_API ... branch from base failed (transient API-error variant).

🟡 3. NEW: copilot-sdk session-auth (dev-iteration, WATCH). Smoke Copilot SDK failed 4× at agent T0 (26792092514, 26793757607, 26795908399, 26797319253) on experimental branch copilot/fix-copilot-sdk-integration — Error: Session was not created with authentication info or custom provider from the copilot-sdk sdk-driver on "sending prompt". Four distinct SHAs over ~3h = active development iteration, not a production regression. The headless server logs No COPILOT_CONNECTION_TOKEN was set, pointing at the missing auth path.

Trend Charts

Success rate fell from the 96.2% peak to 84.2% — a single-day dip, not a sustained slide. The 15-day band sits mostly in the high-80s/low-90s; today's drop is fully explained by the three failure clusters above, and the 4-run experimental SDK workflow accounts for nearly half of it.

Daily tokens (32.7M) came in below the 7-day moving average, continuing the post-05-31 cooldown. Despite lower aggregate usage, two individual workflows still punched through the 25M effective-token per-run cap — the budget problem is concentrated in a few heavy aggregators, not broad token growth.

What went right this window

Contribution Check [SUCCESS] — the safe-output target=* class did not recur here (it failed 05-30 and 06-01). Improvement.
Smoke Claude 2/2 SUCCESS — its 06-01 create_pull_request_review_comment target=* failures did not repeat.
threat-detection-missing-prompt and pr-branch-deleted-race: 2 consecutive quiet windows each.
New engines antigravity / gemini / pi: 2/2 SUCCESS each. Smoke CI 7/7. Changeset Generator (codex) 2/2.
[aw] Failure Investigator (6h): clean, no timeout (2nd clean window).
0 missing-tools, 0 MCP failures, 0 missing-data across all runs.

Cost & token detail

Cost top (new watch): Semantic Function Refactoring $6.35 / 14.3M eff-tok / 43 turns (SUCCESS) — well above 06-01's flat $2.98 top and the single largest claude run this window. Then Daily Documentation Healer $2.00, Smoke Claude $1.55, Failure Investigator $1.48.
Top effective-tokens are the two 429 failures: Daily Firewall Logs 25.67M and Smoke Copilot 25.22M (both over cap), then Semantic Function Refactoring 14.3M, Chaos PR Bundle Fuzzer 11.5M.
Cost is claude-measured only; copilot/codex/gemini/pi/antigravity report EstimatedCost=0, so totals are claude-biased.

Firewall (17.6%, no failure-causing blocks)

Block rate 17.6% (698/3936), down from 19.3%. All hotspots are by-design smoke probes: Smoke Copilot 56/136 (41%), Smoke Codex 38/114 (33%), Smoke Gemini 25/78 (32%), Smoke Claude 29/97 (30%) — Google telemetry, unknown-SNI, Playwright azureedge, localhost. No firewall block caused any of the 9 failures. No action.

Recommendations

(High, [aw-failures] Token-budget exhaustion (25M effective-tokens cap) recurring across 6+ scheduled workflows — 2026-05-29 02:00–07:32 UTC #35661) Fix token-budget-429 on heavy aggregators. Scope/chunk Daily Firewall Logs Collector so it stays under 25M effective tokens, and make the harness fail-fast on the isMaxEffectiveTokensExceededError signature instead of retrying 5× — a hard cap can't be recovered by --continue, and the retries burned ~7–8 min per failed run today.
(High) Add partial-failure tolerance to Process Safe Outputs. Treat an individual failed item as skipped-with-warning whenever ≥1 item in the batch succeeded, and validate target="*" / target="triggering" at the MCP emit boundary on schedule/non-issue/non-PR events so the agent self-corrects in-loop. This would have turned all 3 of today's safe-output failures green-with-warning.
(Medium) Gate the experimental Smoke Copilot SDK until the session is created with auth info / a custom provider (set COPILOT_CONNECTION_TOKEN or wire the custom provider), so dispatch-iteration noise stops polluting failure metrics.
(Low, watch) Monitor Semantic Function Refactoring — if its $6.35/run cost recurs or grows, scope the task or cap turns.

References:

Generated by 🔍 Agentic Workflow Audit Agent · opus48 3M · ◷

expires on Jun 3, 2026, 4:29 AM UTC

2026-06-02T17:39:53Z

github-actions[bot]
Bot Jun 2, 2026
Author

This discussion has been marked as outdated by Agentic Workflow Audit Agent.

A newer discussion is available at Discussion #36500.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[audit-workflows] Daily Agentic Workflow Audit — 2026-06-02 (84.2% success, 9 failures / 3 classes) #36398

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[audit-workflows] Daily Agentic Workflow Audit — 2026-06-02 (84.2% success, 9 failures / 3 classes) #36398

Uh oh!

github-actions[bot] Bot Jun 2, 2026

Daily Agentic Workflow Audit — 2026-06-02

Summary

Critical Issues

Trend Charts

Recommendations

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Jun 2, 2026 Author

github-actions[bot]
Bot Jun 2, 2026

github-actions[bot]
Bot Jun 2, 2026
Author