Skip to content

Classify Copilot auth-null startup failures and add bounded scheduled retry#36663

Closed
Copilot wants to merge 4 commits into
mainfrom
copilot/aw-failures-fix-authorization-error
Closed

Classify Copilot auth-null startup failures and add bounded scheduled retry#36663
Copilot wants to merge 4 commits into
mainfrom
copilot/aw-failures-fix-authorization-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jun 3, 2026

Scheduled copilot engine runs started failing with a new signature: Copilot exits in ~1s with 0 turns / 0 tokens, agent_output.json is empty, and logs show isAuthError=true (No authentication information found). The failure was being treated as a generic exit and not surfaced as a first-class auth provisioning failure.

  • Auth preflight visibility (mask-safe)

    • Added per-attempt preflight logging in copilot_harness.cjs that reports presence/absence (never values) of:
      • COPILOT_GITHUB_TOKEN
      • GH_TOKEN
      • GITHUB_TOKEN
    • This makes token-injection state explicit immediately before Copilot launch.
  • Bounded recovery path for scheduled auth-null races

    • Added MAX_SCHEDULED_AUTH_NULL_RETRIES = 1.
    • On scheduled runs, if a fresh attempt fails with no-auth and all token env vars are absent, the harness performs one delayed fresh retry (to absorb transient provisioning races).
  • Explicit classified failure signal

    • When auth-null remains terminal with all token vars absent, harness emits a structured report_incomplete infrastructure signal with copilot_auth_token_missing detail so this class is visible/classifiable instead of a generic exitCode=1.
  • Focused harness test updates

    • Added coverage for auth-env presence detection and scheduled auth-null retry behavior.
    • Refined retry-policy test helper to use a config object for readability and extension.
const authEnvPresence = buildCopilotAuthEnvPresence(childEnv ?? process.env);
log(
  `attempt ${attempt + 1}: auth preflight` +
    ` COPILOT_GITHUB_TOKEN=${authEnvPresence.copilotGitHubTokenPresent ? "present" : "absent"}` +
    ` GH_TOKEN=${authEnvPresence.ghTokenPresent ? "present" : "absent"}` +
    ` GITHUB_TOKEN=${authEnvPresence.githubTokenPresent ? "present" : "absent"}`
);

Copilot AI and others added 3 commits June 3, 2026 15:56
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix authorization error causing Copilot CLI crashes Classify Copilot auth-null startup failures and add bounded scheduled retry Jun 3, 2026
Copilot AI requested a review from gh-aw-bot June 3, 2026 16:05
@pelikhan pelikhan marked this pull request as ready for review June 3, 2026 16:29
Copilot AI review requested due to automatic review settings June 3, 2026 16:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves observability and classification of Copilot “auth-null” startup failures (notably in scheduled runs), and adds a bounded scheduled retry path intended to absorb transient auth provisioning races.

Changes:

  • Add mask-safe per-attempt auth preflight logging that reports presence/absence (not values) of COPILOT_GITHUB_TOKEN, GH_TOKEN, and GITHUB_TOKEN.
  • Add a bounded “scheduled auth-null” retry (max 1) when Copilot reports “No authentication information found” on a fresh attempt and all auth env vars are absent, plus emit a report_incomplete signal when it remains terminal.
  • Update harness tests for auth-env presence helpers and the scheduled auth-null retry policy; rename workflow step labels to clarify the Copilot SDK install is Node.js-based.
Show a summary per file
File Description
actions/setup/js/copilot_harness.cjs Adds auth env presence snapshot logging and scheduled auth-null retry + classified report_incomplete emission.
actions/setup/js/copilot_harness.test.cjs Adds unit coverage for env presence helpers and scheduled auth-null retry logic; refactors retry test helper signature.
.github/workflows/smoke-copilot-sdk.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/q.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/python-data-charts.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/pr-triage-agent.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/pr-nitpick-reviewer.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/pr-code-quality-reviewer.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/plan.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/pdf-summary.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/org-health-report.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/metrics-collector.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/mergefest.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/mcp-inspector.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/linter-miner.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/layout-spec-maintainer.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/jsweep.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/firewall.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/firewall-escape.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/draft-pr-cleanup.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/docs-noob-tester.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/discussion-task-miner.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/dictation-prompt.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/dev-hawk.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/deployment-incident-monitor.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/delight.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/dead-code-remover.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-workflow-updater.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-testify-uber-super-expert.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-syntax-error-quality.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-spdd-spec-planner.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-skill-optimizer.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-sentrux-report.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-security-observability.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-secrets-analysis.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-safe-output-integrator.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-repo-chronicle.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-performance-summary.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-model-inventory.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-mcp-concurrency-analysis.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-malicious-code-scan.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-issues-report.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-geo-optimizer.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-experiment-report.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-compiler-threat-spec-optimizer.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-compiler-quality.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-cli-performance.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-assign-issue-to-user.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-architecture-diagram.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/daily-agent-of-the-day-blog-writer.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/craft.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/copilot-pr-prompt-analysis.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/copilot-pr-nlp-analysis.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/copilot-pr-merged-report.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/copilot-opt.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/copilot-cli-deep-research.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/code-scanning-fixer.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/cli-consistency-checker.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/ci-coach.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/breaking-change-checker.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/brave.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/artifacts-summary.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/architecture-guardian.lock.yml Renames Copilot SDK install step label to “(Node.js)”.
.github/workflows/agent-performance-analyzer.lock.yml Renames Copilot SDK install step label to “(Node.js)”.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 64/64 changed files
  • Comments generated: 1

Comment on lines +977 to +981
scheduledAuthNullRetries += 1;
useContinueOnRetry = false;
continueDisabledPermanently = true;
log(`attempt ${attempt + 1}: no authentication info on fresh run with all auth env vars absent — retrying once after delay (authNullRetry=${scheduledAuthNullRetries}/${MAX_SCHEDULED_AUTH_NULL_RETRIES})`);
continue;
@pelikhan pelikhan closed this Jun 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #36663 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (default_business_additions=0, threshold=100). No custom design-gate config is present. Neither enforcement condition is met.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

⚠️ PR Code Quality Reviewer failed during code quality review.

@github-actions github-actions Bot mentioned this pull request Jun 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 97/100 — Excellent

Analyzed 12 test(s) across 1 JavaScript file: 12 design tests (100%), 0 implementation tests, 0 guideline violations. All new and modified tests verify behavioral contracts on observable outputs and error paths.

📊 Metrics & Test Classification (12 tests analyzed)
Metric Value
New/modified tests analyzed 12
✅ Design tests (behavioral contracts) 12 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 11 (92%)
Duplicate test clusters 0
Test inflation detected No (test +51 / prod +52 lines, ratio ≈ 0.98)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Notes
"treats undefined, empty, and whitespace values as absent" copilot_harness.test.cjs ✅ Design Edge cases: undefined, "", " "
"treats non-empty values as present" copilot_harness.test.cjs ✅ Design Happy-path; simple utility
"builds a mask-safe auth env presence snapshot" copilot_harness.test.cjs ✅ Design Mixed present/absent, verifies anyTokenPresent
"does not retry when auth fails on first attempt" copilot_harness.test.cjs ✅ Design Error path; signature updated
"does not retry when first attempt reports authentication failed" copilot_harness.test.cjs ✅ Design Error path; signature updated
"retries as fresh run when auth fails on a --continue attempt" copilot_harness.test.cjs ✅ Design Multi-boundary retry budget
"does not retry when auth fails on a fresh-run recovery attempt" copilot_harness.test.cjs ✅ Design Error path; signature updated
"does not retry auth error even when output is mixed" copilot_harness.test.cjs ✅ Design Mixed-content edge case
"retries once for scheduled auth-null fresh failures..." copilot_harness.test.cjs ✅ Design New — boundary: retry budget 0→1
"does not apply scheduled auth-null retry when any auth token env var is present" copilot_harness.test.cjs ✅ Design New — conditional gate
"still retries non-auth errors with output (CAPIError 400)" copilot_harness.test.cjs ✅ Design Non-auth error path
"still retries generic partial-execution errors with output" copilot_harness.test.cjs ✅ Design Generic error path

Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 12 tests (vitest)

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All 12 tests verify behavioral contracts, including boundary conditions for the new bounded scheduled retry logic.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §26898522910

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 988.8K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 97/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 12 tests verify behavioral contracts on observable outputs and error paths.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.4M

* @returns {boolean}
*/
function shouldRetry(result, attempt, useContinueOnRetry = false) {
function shouldRetry(result, attempt, options = {}) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The shouldRetry helper here is a hand-rolled reimplementation of the production retry logic, not a call to any exported harness function. As the production decision tree grows (this PR added a new branch), this copy must be kept in sync manually — a silent drift risk.

💡 Suggestion

Consider extracting the retry-decision logic from main() into a pure, exported function (e.g. shouldRetryAfterAuthNullError(opts)) so tests can call the real code rather than a replica. This is the /tdd pattern: production logic is the source of truth, tests exercise it directly.

Alternatively, at minimum add a comment linking this helper to the specific harness lines it mirrors (e.g. // mirrors copilot_harness.cjs lines 960-987) so reviewers know where to look when the harness changes.

log(`attempt ${attempt + 1}: no authentication info on fresh run with all auth env vars absent — retrying once after delay (authNullRetry=${scheduledAuthNullRetries}/${MAX_SCHEDULED_AUTH_NULL_RETRIES})`);
continue;
}
if (!authEnvPresence.anyTokenPresent) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] emitInfrastructureIncomplete is now called when all auth env vars are absent after retries are exhausted — this is the key new diagnostic signal. But there is no test that verifies this side effect is triggered after the MAX_SCHEDULED_AUTH_NULL_RETRIES budget is spent.

💡 What to add

A test (even inline in the existing describe block) should verify that after the bounded retry fires and the follow-up attempt also fails with no tokens, emitInfrastructureIncomplete is eventually called. The current tests only assert shouldRetry → false (line 1258), not the escalation that follows.

This matters because the whole motivation for the PR is surfacing auth-null as a first-class failure — the test suite should prove that guarantee holds end-to-end.

*/
function shouldRetry(result, attempt, useContinueOnRetry = false) {
function shouldRetry(result, attempt, options = {}) {
const { useContinueOnRetry = false, isScheduledRun = false, anyTokenPresent = true, scheduledAuthNullRetries = 0 } = options;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The anyTokenPresent default is true in the test helper's destructuring, which is the opposite of the failure scenario being tested. Any test that omits anyTokenPresent is silently testing the "tokens present" path, making it easy to write an incomplete test that passes.

💡 Suggestion

Consider defaulting to false (matching the failure case under study), or make it a required field without a default so callers are forced to be explicit. At minimum, add a comment explaining the chosen default.

if (isScheduledRun && !authEnvPresence.anyTokenPresent && scheduledAuthNullRetries < MAX_SCHEDULED_AUTH_NULL_RETRIES && attempt < MAX_RETRIES) {
scheduledAuthNullRetries += 1;
useContinueOnRetry = false;
continueDisabledPermanently = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] continueDisabledPermanently = true is set here to prevent --continue from being re-armed after the auth-null retry. This is correct, but there is no test verifying this side effect — if the flag were accidentally omitted, a subsequent non-auth failure on the same run could re-enable --continue and replay a corrupt conversation.

💡 What to add

The existing null-type tool_call restarts fresh instead of --continue describe block (around line 1302) shows how to test continueDisabledPermanently state transitions. A similar micro-test for the auth-null retry path would document the invariant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants