Skip to content

[reliability] Daily Reliability Review - 2026-06-02 #36550

@github-actions

Description

@github-actions

Executive Summary

Overall health for the last 24h is good. Telemetry is flowing: the spans dataset has ~18.4k spans and gh-aw conclusion spans are well-attributed (gh-aw.workflow.name, gh-aw.run.status, gh-aw.engine.id all populated). Of 1,889 run-conclusion spans, 52 carried gh-aw.run.status:failure, corresponding to ~13 distinct failed runs (≈0.7% of conclusion spans). Failures are mostly one-off and spread across many workflows; the only recurring patterns are PR Sous Chef (3 failed runs) and PR Code Quality Reviewer (2 failed runs), both on the copilot engine. Trace continuity was verified intact on a representative PR Sous Chef failure.

No timeouts, cancellations, or OTLP-export failures were confirmed. However, several reliability-relevant fields are not consumable from Sentry: gen_ai.response.finish_reasons returns zero results over 7d (blocking truncation / runaway-token detection), span.status is null on every span, release is null, and the errors and logs datasets are empty — so failure root-causes cannot be corroborated from log/error telemetry. These are observability gaps, not confirmed runtime failures.

Top Reliability Findings

Priority Workflow Problem Evidence Next Action
P1 PR Sous Chef Recurring run failures at agent phase (copilot) 3 failed runs/24h (~18% of ~17 runs); spans gh-aw.agent.conclusion + detection/safe_outputs/conclusion = failure; runs 26834737280, 26818750469, 26794554270; trace 1c17b6dd5b51b810e3ee7f84be03ab3a Open the 3 run logs; agent phase ran ~311s before failing — check for agent error vs. safe-output rejection
P2 PR Code Quality Reviewer Recurring run failures at agent phase (copilot) 2 failed runs/24h (~10% of 19 runs); runs 26849261658, 26837142276; trace 5d9db5c5976cad556efcf783b0b57c80 Compare the 2 failed runs against passing runs of same workflow
P3 (instrumentation) gen_ai.response.finish_reasons not queryable in Sentry → truncation / runaway-token blind spot has:gen_ai.response.finish_reasons0 results over 7d, despite emit-side always emitting it (send_otlp_span.cjs:2011, array attr via buildArrayAttr) Verify Sentry indexes array-valued span attrs, or emit a scalar mirror (e.g. gh-aw.finish_reason)
P4 (instrumentation) span.status null on all spans; release null → no OTLP-status filtering, no regression correlation All spans span.status:null; release:null. Emit-side sets OTLP status.code=2 on failures (send_otlp_span.cjs:1908/1944) and service.version resource attr (:322) Confirm OTLP status.code→Sentry span.status and service.versionrelease mapping in the Sentry ingest config
P5 (observability) errors and logs datasets empty → failure root-cause can't be corroborated from logs/errors errors and logs queries return 0 results / 24h (and unfiltered) Decide whether gh-aw should ship error/log telemetry, or document that spans are the sole signal

Representative Traces

View representative traces

PR Sous Chef failure (P1) — trace 1c17b6dd5b51b810e3ee7f84be03ab3a

  • Continuity intact: gh-aw.activation.conclusion = success, then gh-aw.agent.conclusion = failure (span.duration ≈ 311,698 ms / ~5.2 min), gh-aw.detection.conclusion = failure (~49.9 s), gh-aw.safe_outputs.conclusion = failure (~6.0 s).
  • Many api_proxy.copilot.request child spans on the same trace (7–15 s each) confirm copilot engine activity before the agent-phase failure.
  • Run: 26818750469

PR Code Quality Reviewer failure (P2) — trace 5d9db5c5976cad556efcf783b0b57c80, run 26849261658

Recommendations

  1. Triage the 3 PR Sous Chef + 2 PR Code Quality Reviewer copilot failures (smallest fix first): open the linked run logs to determine whether the agent phase is failing on an agent error or a downstream safe-output rejection — both currently surface only as gh-aw.run.status:failure with no log corroboration.
  2. Close the truncation blind spot: emit a scalar finish-reason attribute alongside the array (gen_ai.response.finish_reasons), since the array form is not queryable in Sentry — without it, finish_reasons:length / runaway-token detection is impossible.
  3. Fix backend field mapping: ensure OTLP status.code surfaces as Sentry span.status and service.version surfaces as release; both are emitted but null at the consumer, blocking status-based filtering and any regression-vs-baseline comparison.
  4. Decide on error/log telemetry: the errors and logs datasets are empty, so this review relies solely on span attributes — either ship complementary error/log signal or document spans as the single source of truth.

Notes

View notes
  • Tooling: Sentry MCP build exposes list_events (used here); search_events and get_trace_details were not available, so trace continuity was validated via list_events filtered by trace:<id> per the skill fallback.
  • Inconclusive vs confirmed: run-level failures are confirmed via gh-aw.run.status:failure. Root causes are inconclusive (no errors/logs data; gen_ai.response.finish_reasons/span.status not consumable). Do not read these as confirmed timeouts.
  • No timeouts/cancellations observed: no cancelled status; agent.setup spans peak ~12 s and the ~311 s PR Sous Chef agent phase is a phase duration, not a hung-span timeout.
  • Healthy attributes (present & well-populated): gh-aw.workflow.name, gh-aw.run.status, gh-aw.engine.id (copilot 2563, claude 888, codex 194, gemini 58, pi 40, antigravity 40 spans). Note the attribute key is gh-aw.engine.id, not gh-aw.engine.
  • Missing/null at consumer: gen_ai.response.finish_reasons (0/7d), span.status (all), release (all), gen_ai.usage.total_tokens & gen_ai.response.model (null on gen_ai spans → no token-cost outlier analysis from spans).
  • Regression assessment: low, distributed failure rate; treated as normal background except the two recurring copilot workflows. No clear baseline beyond 24h was used.

References: §26818750469 · §26849261658 · §26834737280

Generated by 🚨 Daily Reliability Review · opus48 1.5M ·

  • expires on Jun 4, 2026, 11:30 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions