[audit-workflows] Daily Agentic Workflow Audit — 2026-06-02 (afternoon window, 89.4% success, no 429) #36500
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #36517. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Daily Agentic Workflow Audit — 2026-06-02 (afternoon window)
This is a second, afternoon re-audit of 2026-06-02 covering completed runs from 14:31Z–17:23Z, complementing this morning's full-day audit (#36398, 23:42Z 06-01 → 04:15Z 06-02). Of the 24h log set, 47 runs completed with full summaries in this window (plus 2 in-progress, including this audit).
Headline: a healthier slice. Success recovered to 89.4% (42/47), back in the stable ~89–91% band, and the dominant token-budget 429 mode was absent this window — the heaviest run (UK AI Operational Resilience, 23.79M effective tokens) finished comfortably under the 25M cap. The 5 failures are spread across 5 distinct classes, with 3 of them infra/experimental/non-agent rather than a single recurring defect.
Summary
Critical Issues
🟢 token-budget-429 did NOT recur (was the morning's top class, #35661). No
Maximum effective tokens exceeded/isMaxEffectiveTokensExceededError=trueanywhere in this window. Heaviest: UK AI Operational Resilience 23.79M < 25M cap (SUCCESS). This is the second window-level reprieve, but the morning window had two over-cap failures, so the cap remains intermittently hit by heavy daily-aggregation workflows. Fix (fail-fast on the signature + chunk heavy workflows) stays warranted.🟠 safe-output partial-failure-intolerance persists — new "count-exceeded" variant. CLI Consistency Checker
26826617790(copilot) emitted 3×create_issueagainst a cap of 1 → 2 validation errors (Too many items of type 'create_issue'. Maximum allowed: 1) → agent step exit 1 → run failure, even though thesafe_outputsjob succeeded and created the 1 allowed issue. This joins the existing variants (target=*, target=triggering, no-number, transient ERR_API). The whole family shares one root: a safe-output validation problem red-fails the run despite useful output landing.Other Failures (infra / experimental / non-agent)
3 infra/experimental failures + 1 super-linter job
🆕 runner-infra · node missing (exit 127) — Daily Issues Report Generator
26831132205(copilot, schedule). Thecopilot_harnessguard fired: "node runtime missing on this runner — check runtimes.node in workflow YAML; exit 127". A pre-agent infra failure (runner lacked a usable node binary), not agent logic. Single occurrence — monitor for recurrence pointing atruntimes.node/ runner image.🆕 agent-timeout · no output — Avenger
26831111185(claude/opus-4-8, schedule) ran 48.4 min (15:47:48→16:36:13Z) actively calling the opus API and dispatchingBash+TaskOutput, then failed withTurns=0,ErrorCount=0, empty output. Looks like the agent-step time limit hit while still working (possibly a stuck/blockingTaskOutputpoll). Single occurrence — if it recurs, cap turns/runtime.experimental copilot-sdk · session.idle timeout — PR Code Quality Reviewer
26827796547(copilot) on experimental branchcopilot/ab-advisor-experiment-campaign-max-turnshitTimeout after 600000ms waiting for session.idle(10 min hard, no output, not retried). A timeout variant of the experimental copilot-sdk family from this morning's auth-failure class — still dev-iteration, not a production regression. Recommend gating these experimental workflows from scheduled/PR triggers until the SDK session lifecycle is stable.super-linter job (non-agent) — Super Linter Report
26829252826(copilot, schedule): thesuper_linterjob failed and the agent step was skipped. Non-agentic lint-job failure; low audit relevance.Trends (30-day available history)
Workflow health has held a stable 85–96% success band across the window, with two visible dips: 06-23's outlier 41.6% (a large batch-failure day) and a softer 81% trough on 05-28. The recent days — 91.1% (05-30), 89.8% (05-31), 96.2% peak (06-01), 84.2% (06-02 full-day) — show normal day-to-day variance rather than a regression; this afternoon's 89.4% slice sits right in the healthy band.
Token usage swings between ~15M and ~69M tokens/day, with the 7-day moving average holding around 40–45M. The 05-31 spike to 68.8M (a heavy refactor/aggregation day) is the band's ceiling; daily totals have eased since, and the afternoon slice (15.6M partial) is consistent with a lighter back-half of the day.
Firewall
Blocked 21.2% (586/2768) — by-design smoke/probe and PR-reviewer telemetry. Hotspots: CLI Consistency Checker 71/303 (23%), PR Code Quality Reviewer 64/254 (25%), UK AI Operational Resilience 43/183 (23%). No firewall block caused any of the 5 run failures.
Recommendations
exit 127.References:
Beta Was this translation helpful? Give feedback.
All reactions