Skip to content

v0.71 COPILOT_API_KEY dummy-byok-key causes 10-100x premium request over-billing vs v0.68 #30324

@tore-unumed

Description

@tore-unumed

Summary

After upgrading from gh-aw v0.68.3 to v0.71.1 (and then v0.71.3), the number of Copilot premium requests billed per workflow run increased by 10-100x. A run that previously consumed 1-2 premium requests now consumes 50-100+, because GitHub appears to be billing every API call (both user-initiated and agent response turns) as a premium request, instead of only billing user-initiated turns.

Root cause

The v0.71 compiler injects a new environment variable into the lock file that was not present in v0.68:

COPILOT_API_KEY: dummy-byok-key-for-offline-mode

This activates the BYOK detection path in the Copilot CLI. In v0.68, this variable did not exist — the CLI authenticated normally through the api-proxy sidecar without triggering BYOK mode.

v0.68.3 lock file (working correctly)

copilot_driver.cjs /usr/local/bin/copilot --autopilot ...
# env:
COPILOT_MODEL: claude-sonnet-4
# No COPILOT_API_KEY

v0.71.3 lock file (over-billing)

copilot_harness.cjs /usr/local/bin/copilot --autopilot ...
# env:
COPILOT_API_KEY: dummy-byok-key-for-offline-mode
COPILOT_MODEL: claude-sonnet-4.6

Billing data

Correlated GitHub billing CSVs (premiumRequestUsageReport) with per-run artifact data (counting X-Initiator: user headers in API logs):

Date Billed premium requests Mined total API calls Mined user-initiated only Billed / Total ratio
Apr 29 (v0.68.3) 22 327 ~23 0.07
May 3 (v0.71.3) 325 318 ~16 1.02
May 4 (v0.71.3) 1,610 1,705 ~77 0.94

In v0.68, only ~7% of API calls were billed (matching user-initiated turns). In v0.71, ~94-102% of ALL API calls are billed — both user-initiated turns AND agent response turns.

Expected behavior

The cost management docs state: "A typical workflow run uses 1–2 premium requests." This was accurate in v0.68 but is wildly off in v0.71. The copilot engine should bill premium requests the same way regardless of whether copilot_driver.cjs or copilot_harness.cjs is used as the launcher.

Impact

For a moderately active repository running ~20-30 workflow runs per day, this regression causes:

  • v0.68: 30-60 premium requests/day ($1.20-$2.40/day at $0.04/req)
  • v0.71: 1,500-2,000 premium requests/day ($60-$80/day at $0.04/req)

This is a 30-50x cost increase with no change to the workflow source files.

Environment

  • gh-aw versions tested: v0.68.3 (correct billing), v0.71.1 and v0.71.3 (over-billing)
  • Copilot CLI: 1.0.21 (v0.68) → 1.0.35/1.0.36 (v0.71)
  • Engine: copilot with model: claude-sonnet-4 / claude-sonnet-4.6
  • AWF firewall: 0.25.20 (v0.68) → 0.25.28/0.25.29 (v0.71)

Workaround

None known within the copilot engine. The .md source files cannot control whether COPILOT_API_KEY is injected — it is added by the compiler.

Steps to reproduce

  1. Compile any workflow with engine.id: copilot using gh-aw v0.71.x
  2. Run the workflow
  3. Check the GitHub billing CSV (premiumRequestUsageReport) for the COPILOT_GITHUB_TOKEN owner
  4. Compare billed premium requests to the number of user-initiated API calls in the run artifacts
  5. Observe that ALL API calls (user + agent) are billed, not just user-initiated ones

Workaround test: removing the dummy key breaks MCP servers

Tested removing COPILOT_API_KEY: dummy-byok-key-for-offline-mode from lock files post-compile. Premium request billing returned to normal (2 per run), but all custom MCP servers were blocked by policy:

! 3 MCP servers were blocked by policy

The dummy key is required for the AWF BYOK runtime path that allows custom MCP servers. Without it, the Copilot CLI falls back to standard mode where enterprise/org policy blocks non-built-in MCP servers.

There is no user-side workaround — the BYOK runtime path (needed for MCP) and the billing behavior change (all calls counted as premium) are coupled.

Related: #28470 — likely the same root cause, misdiagnosed

#28470 ("See large increase in runtime cost with 0.71.0") reported an identical 20-30x cost spike after v0.71. The Q audit investigation attributed it to:

  1. Detection job now functional (was crashing in v0.68 due to node: command not found) → 3x more jobs completing
  2. 67% more turns in the agent job due to blocked network requests

The investigation concluded: "the increased cost reflects what the workflow was supposed to cost all along."

However, the audit focused on job count and turn count — it never examined how premium requests are metered per API call. Our billing CSV correlation data shows the per-call billing ratio changed from 0.07 → ~1.0 between v0.68 and v0.71. A 3x from detection fix is real but minor; the 14x change in per-call billing ratio from the BYOK dummy key is the dominant factor.

Billing ratio: v0.68 vs v0.71

Date Version Mined API calls GitHub-billed premium reqs Ratio
Apr 27 v0.68 475 31 0.07
Apr 28 v0.68 174 14 0.08
Apr 29 v0.68 327 22 0.07
Apr 30 v0.71 505 377 0.75
May 2 v0.71 48 68 1.42
May 3 v0.71 318 325 1.02
May 4 v0.71 1,705 1,610 0.94
Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions