[audit-workflows] Agentic Workflow Audit — 2026-06-02 NIGHT window (19:54–22:06Z) #36543
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-06-03T22:19:59.616Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
This is the fourth audit window of 2026-06-02 (after morning, afternoon, and evening), covering completed runs between 19:54Z and 22:06Z. The night window saw 31 runs (28 completed, 3 in-progress) at 85.7% success (24/28). Four failures fell into just two classes: a NEW transient Docker Hub registry-pull timeout (2 runs, pure infra) and the persistent safe-output partial-failure-intolerance class (2 runs, 2 new variants). Both of the day's recurring high-severity classes —
token-budget-429andcopilot-sdk session timeout— were absent this window. Notably, 3 of the 4 failures occurred on a single dev feature branch (copilot/add-support-for-copilot-connection-token), so production-mainimpact was limited to one infra blip.Window Summary
Critical Findings
🆕 docker-registry-pull-timeout (infra, 2 runs) — NEW class
Both Smoke CI (§26849511621,
main) and Smoke Codex (§26847739078, feature branch) failed the agent step at container-image pull:Get (registry1.docker.io/redacted) context deadline exceededfornode:lts-alpine, three retries, then exit 123. This is transient Docker Hub registry friction (~21:05–21:39Z) — not agent logic and not firewall-blocked. Two unrelated workflows hitting the same root cause in one window points to a brief registry slow/outage window rather than a workflow defect.🔁 safe-output-partial-failure-intolerance (2 runs) — RECURRING, 2 new variants
A single failed safe-output message still red-fails the entire
safe_outputsjob even when other messages succeed:push_to_pull_request_branchrejected because the patch modifiedpkg/cli/codemod_pull_request_target_checkout_false.go, outside the allowed-files list. The security guard worked as intended, but the single rejection reddened the job.dispatch_workflow "haiku-printer"failed withNo ref found for: refs/heads/copilot/add-support-for-copilot-connection-token;create_discussionand other messages landed fine, but message 3 red-failed the job.This class has now produced six distinct variants over recent windows and remains the dominant, fixable failure mode.
Positives This Window
token-budget-429-effective-tokensabsent (3rd consecutive window; top effective-token run only 10.65M, well under the 25M cap)copilot-sdksession.idle/auth timeouts absent (had escalated to prod-main in the evening window)Trend Charts (14-day)
Workflow Health — Daily Runs & Success Rate
Success rate has held in a healthy 80–96% band since the 05-23 dip (41.6%, a known bad day). The 06-02 daily rollup (84.2%) sits at the lower end; intraday the day actually trended 84.2% → 89.4% → 97.8% → 85.7% across the four windows, with the night dip driven by infra + dev-branch noise rather than a workflow regression.
Token Usage — Daily + 7-day Moving Average
The 7-day moving average has drifted down from ~60M to ~38.6M tokens/day, with daily totals oscillating between ~14M and ~69M. The 06-02 full-day total (32.7M) is below the moving average — consumption is stable with no runaway-cost trend.
Failure Detail Table
Firewall Hotspots (by design — smoke probes)
Overall 287/1337 (21.5%) blocked — in line with prior windows. Smoke-test workflows intentionally probe egress, so high block rates here are expected and not a concern.
Recommendations
node:lts-alpinein a warm layer, add a registry mirror, or extend retry-with-backoff beyond 3 attempts. (Watch for recurrence first; single-window so far.)safe_outputstolerate per-message failures — exit non-red when ≥1 message lands and the remaining failures are expected/guarded conditions (allowed-files rejection, dispatch no-ref-on-branch, count-exceeded, missing-issue-context). This is the day's dominant fixable class.token-budget-429andcopilot-sdktimeouts — both quiet this window but intermittent; the SDK class hit prod-main earlier today.References:
Beta Was this translation helpful? Give feedback.
All reactions