Stricter exit code when MCP servers fail to start (CI / agent use)

### Describe the feature or problem you'd like to solve

The CLI exits 0 in CI even when the agent had zero tools available at startup or terminated before producing any output. In Actions / CI consumers (Agentic Workflows in our case) the exit code is the contract; a green exit on a no-tools run looks like success and downstream `if: success()` jobs run as if the agent worked.

### Proposed solution

Make `copilot` exit non-zero in this failure mode so CI consumers can trust the exit code. Two shapes work for us, pick whichever fits the CLI's design:

- `CI=true`-conditional behavior: when `CI=true`, exit non-zero if any configured MCP server fails to start. Matches the convention `npm`, `yarn`, `jest` already follow for stricter machine-readable defaults; preserves the interactive UX.
- Explicit flag (e.g. `--strict-mcp-start`): clearer opt-in, easier to roll out without changing existing behavior.

Either way, the benefit is that any CI consumer (not just AW) can rely on the exit code as the primary signal instead of parsing logs after the fact.


### Example prompts or workflows

1. **MCP allowlist enforcement fails closed in a sandboxed runner.** v1.0.22 enabled enforcement that calls `https://api.github.com/copilot/mcp_registry` directly at startup. In Agentic Workflows the agent container has no auth token (all `api.github.com` access flows through an api-proxy sidecar) and `api.github.com` isn't on the egress allowlist, so the request is blocked, the CLI fails closed on every MCP server, and the agent runs with an empty tool surface. v1.0.22 still exited 0, the Actions job came back green, and we didn't detect the regression — a customer did ([gh-aw#25550](https://github.com/github/gh-aw/issues/25550), [gh-aw#25680](https://github.com/github/gh-aw/issues/25680)).
2. **Broader shape of the same ask:** any configured MCP server failing to start (config error, network blip, server-side outage, partial allowlist mismatch). We haven't been bitten by a partial-MCP-failure scenario yet, but losing one MCP server can degrade the agent's tool surface in subtle ways, and we'd rather have a non-zero exit than silently run with fewer tools.


### Additional context

- Customer reports from the v1.0.22 incident: [gh-aw#25550](https://github.com/github/gh-aw/issues/25550), [gh-aw#25680](https://github.com/github/gh-aw/issues/25680)
- Customer-facing run showing the silent-success pattern: [`devantler-tech/ksail` run 24270819213](https://github.com/devantler-tech/ksail/actions/runs/24270819213)
- Kusto fingerprint we used after the fact: skipped `detection` jobs in AW (whose `if:` checks the agent job's result) jumped from a baseline of 8 to 40 per day to 2,569 on 04-10. Distinct repos jumped from 10 to 20 per day to ~438. That signal was only visible in our telemetry, not in the exit code.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stricter exit code when MCP servers fail to start (CI / agent use) #3064

Describe the feature or problem you'd like to solve

Proposed solution

Example prompts or workflows

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Stricter exit code when MCP servers fail to start (CI / agent use) #3064

Description

Describe the feature or problem you'd like to solve

Proposed solution

Example prompts or workflows

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions