feat: add `gh aw forecast` command for projecting workflow effective token usage (experimental) by Copilot · Pull Request #31377 · github/gh-aw

Copilot · 2026-05-10T21:03:23Z

New Feature

Adds gh aw forecast — a new experimental analysis command that samples recent GitHub Actions run history and projects effective token usage and yield forward on a per-week or per-month basis using a Gamma–Poisson Monte Carlo simulation.

What does this feature do?

Samples run history — fetches completed runs over a configurable window (7 or 30 days, up to N runs per workflow) and derives per-run averages: effective tokens, duration, success rate. Duration is computed from started_at/updated_at timestamps; effective tokens are enriched from locally-cached run_summary.json files written by gh aw logs (no artifact re-download required)
Gamma–Poisson Monte Carlo projection — runs 10 000 simulation trials per workflow modelling three independent sources of uncertainty: run count (Gamma–Poisson compound process), per-run token usage (bootstrap resampling of historical observations), and per-run success (Bernoulli draws). Produces P10/P50/P90 effective-token confidence intervals rather than a single point estimate. Results are sorted by Monte Carlo P50. Low-data workflows (fewer than 5 sampled runs) are flagged with IsReliable = false and marked in the console table
Backtesting with --eval — shifts the entire training window back by one projection period so the forecast can be compared against actual runs in the most recent period. Computes accuracy metrics (P50ErrorAbs, P50ErrorPct, InCI) and renders a Backtesting evaluation table showing Actual Runs, Actual ET, Forecast P50, Error (abs), Error %, and whether the actual result fell within the 80% CI
Reads frontmatter — surfaces active trigger types (schedule, pull_request, etc.) from each workflow's Markdown source
Splits by experiment variant — when a workflow defines A/B experiments, run counts and fractions are broken out per variant
Multi-workflow comparison — pass multiple workflow IDs as positional args, or omit to forecast all agentic workflows side-by-side; per-workflow failures emit a warning and continue rather than aborting the entire forecast
Progress spinner — long API sampling phase shows animated feedback via console.NewSpinner
JSON output — --json emits the full ForecastResult struct for agent consumption, including the monte_carlo field with ET mean, stddev, P10/P50/P90 percentile fields, is_reliable, and (in eval mode) the full evaluation object per workflow
Remote repository support — --repo owner/repo forecasts workflows in any accessible repository; workflows are discovered via the GitHub API and run history is fetched with gh run list --repo
Episode analysis — groups runs into logical task episodes using the existing buildEpisodeData engine; surfaces per-episode token usage and episodes-per-period, and prints an episode breakdown table when orchestrator-style workflows are detected (runs/episode > 1)
Marked experimental — the command short description includes (experimental) and a warning is printed to stderr at runtime so users know the interface may change
W3C-style specification — a formal specification document is published at docs/src/content/docs/reference/forecast-specification.md (sidebar order 1355, adjacent to the MCP Gateway and Effective Tokens specs), covering command interface, workflow discovery, the Monte Carlo algorithm, episode analysis, JSON schema, error handling, and compliance test cases

gh aw forecast                              # all workflows, monthly
gh aw forecast ci-doctor daily-planner      # compare two workflows
gh aw forecast --period week --days 7       # weekly projections over 7-day window
gh aw forecast --json                       # machine-readable output
gh aw forecast --repo owner/repo            # forecast workflows in another repository
gh aw forecast --repo owner/repo ci-doctor  # forecast a specific workflow in another repository
gh aw forecast --eval                       # backtest: compare last period's forecast against actuals

Implementation details

File	Role
`pkg/cli/forecast_command.go`	Cobra command + flag definitions (experimental label, `--eval` flag, corrected help text)
`pkg/cli/forecast.go`	Sampling, aggregation, projection, episode analysis, eval backtesting, rendering, experimental warning
`pkg/cli/forecast_montecarlo.go`	Monte Carlo simulation engine (Gamma sampler, Poisson sampler, bootstrap, percentile helpers)
`pkg/cli/forecast_montecarlo_test.go`	19 unit tests for simulation components
`pkg/cli/forecast_test.go`	Unit tests for format helpers, flag validation, and duration enrichment
`cmd/gh-aw/main.go`	Registers `forecast` in the `analysis` command group
`docs/src/content/docs/reference/forecast-specification.md`	W3C-style formal specification (Experimental Draft, sidebar order 1355)

Projection is driven by a Gamma–Poisson compound Monte Carlo simulation (the Negative Binomial model standard in actuarial science and industrial reliability). For each trial, the arrival rate λ is drawn from its Bayesian posterior Gamma(n + 0.5, scale = λ̂/n) — where n is the observed run count and 0.5 is the Jeffreys non-informative prior shape — then the run count is drawn from Poisson(λ_trial). Per-run effective tokens are sampled via bootstrap resampling of historical observations, and each run independently succeeds with the historical success rate (Bernoulli). This compound model naturally produces wider confidence intervals for small samples and converges to the classical Poisson estimate as n grows. The gammaSample function uses the Marsaglia-Tsang squeeze method. Aggregating 10 000 trials yields P10/P50/P90 effective-token estimates.

Backtesting (--eval) date window:

[anchor - days ... anchor]  training window (same --days of history, shifted back one period)
[anchor ... now]            validation window (the period being evaluated)
anchor = now - periodDays

The ForecastEvaluation struct records training_start_date, training_end_date, validation_end_date, actual_runs, actual_effective_tokens, p50_error_abs, p50_error_pct, and in_ci (whether actual ET fell within the P10–P90 interval). Runs with missing timestamps are excluded from validation-window counting to avoid undefined bias.

The --days flag accepts 7 or 30 (maximum 30 days). The console table columns are: Workflow, Sampled Runs, Success Rate, Yield/Period (throughput: success_rate × runs_per_period), Avg ET, Proj. ET (P50), 80% CI (P10–P90), and Triggers. ET values are formatted as K/M abbreviations. Workflows with fewer than 5 sampled runs are marked * in the table with a footnote warning. The --json output includes the full monte_carlo summary with mean_projected_effective_tokens, std_dev_effective_tokens, all three ET percentile fields, and is_reliable. The yield JSON field represents the throughput rate (success_rate × observed_runs_per_period), distinct from success_rate.

When --repo is set, workflow discovery uses fetchGitHubWorkflows (GitHub API) instead of local .lock.yml files. Provided workflow IDs are matched case-insensitively against remote workflow display names and file-path basenames. Frontmatter metadata degrades gracefully to empty for remote repos since Markdown source files are local-only.

Episode analysis reuses the existing buildEpisodeData + classifyEpisode engine from logs_episode.go. Because no artifact downloads occur during forecasting, only GitHub Actions API fields (event, headSha, headBranch) are used for linkage — the resulting episode count is therefore a lower-bound estimate for orchestrator-style workflows. The ForecastEpisodeSummary struct exposes sampled_episodes, runs_per_episode, avg_effective_tokens_per_episode, and observed_episodes_per_period in JSON output and as a console table.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-05-10T21:13:04Z

Hey @Copilot 👋 — the gh aw forecast command is a genuinely interesting addition, and the PR description is thorough and well-structured. That said, there are a couple of things that need to be addressed before this can move forward:

Process violation — per the CONTRIBUTING.md, non-core team members should not open PRs directly. The correct path is to open a detailed issue with an agentic plan, then a core team member picks it up and implements it using agents. The Copilot bot has CONTRIBUTOR (not MEMBER/OWNER) association, which places it outside the inner-circle process.
No tests — the three new files (pkg/cli/forecast.go, pkg/cli/forecast_command.go, cmd/gh-aw/main.go) add 665 lines with zero test coverage. Sampling logic, projection arithmetic, and rendering helpers all warrant unit tests.

If you'd like a hand drafting the issue plan or adding tests, assign this prompt to your coding agent:

Close this PR. Instead, open a GitHub issue on github/gh-aw titled "Plan: add `gh aw forecast` command for projecting workflow token usage and costs".

In the issue body, write a detailed agentic plan that describes:
1. The motivation and user story for the forecast command.
2. The proposed file structure (forecast_command.go, forecast.go, main.go registration).
3. The full implementation plan, including sampling logic, projection formula, frontmatter reading, and JSON output.
4. A testing plan covering unit tests for: sampling aggregation, projection calculations, per-variant splitting, and JSON rendering.

Do NOT include code — only the plan. The core team will implement it using agents.

Generated by Contribution Check · ● 9.3M · ◷

pelikhan · 2026-05-10T21:24:43Z

@mnkiefer feel free to prompt further

pelikhan · 2026-05-10T21:27:39Z

@copilot add --repo support

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-10T21:35:34Z

@copilot add --repo support

Done in feat: wire --repo flag through workflow discovery in forecast command.

The --repo flag was already wired into the GitHub API calls for fetching run history. The missing piece was workflow discovery: resolveForecastWorkflows was still reading local .lock.yml files even when --repo is set.

Changes made in forecast.go:

When --repo is provided, resolveForecastWorkflows now calls fetchGitHubWorkflows to list workflows from the remote repo via the API, rather than reading local .lock.yml files.
When no workflow IDs are specified, all workflows from the remote repo are forecasted.
When workflow IDs are specified, they are matched case-insensitively against remote workflow display names and file-path basenames (matchRemoteWorkflowName).

Frontmatter metadata (triggers, concurrency) degrades gracefully to empty for remote repos since the Markdown source files are local-only.

pelikhan · 2026-05-10T21:43:58Z

@copilot integrate episode analyzis

pelikhan · 2026-05-10T21:46:32Z

@copilot use best in class predictive technique based on operational research and statistics, montecarlo

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-10T21:53:26Z

@copilot integrate episode analyzis

Done in feat: integrate episode analysis into forecast command.

The existing buildEpisodeData + classifyEpisode engine from logs_episode.go now runs on the completed runs sampled during forecasting. Since no artifact downloads happen, episode linkage uses the fields available in the GitHub Actions run list (event, headSha, headBranch). Dispatch and workflow_call lineage that require aw_info.json are not available, so the episode count is a lower-bound estimate for orchestrator workflows — this is noted in the code.

New additions:

ForecastEpisodeSummary — sampled_episodes, runs_per_episode, avg_effective_tokens_per_episode, observed_episodes_per_period, projected_cost_per_episode
EpisodeAnalysis *ForecastEpisodeSummary field on ForecastWorkflowResult (JSON episode_analysis, omitted when nil)
workflowRunToRunData — converts WorkflowRun → RunData using API-available fields
buildForecastEpisodeSummary — calls the episode engine and aggregates per-episode metrics
Console: when any workflow shows runs_per_episode > 1 (multi-run episodes detected), an "Episode analysis" table is printed after the main forecast table

…trap + Bernoulli CI) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-10T22:07:24Z

@copilot use best in class predictive technique based on operational research and statistics, montecarlo

Done in feat: add Monte Carlo simulation to forecast command (Poisson + bootstrap + Bernoulli CI).

The forecast now runs a 10 000-trial Monte Carlo simulation per workflow modelling three independent sources of uncertainty:

Source	Model
Run count per period	Poisson(λ = observed runs/period)
Per-run token usage	Bootstrap resampling of historical ET observations
Per-run success	Bernoulli(historical success rate)

New output fields (monte_carlo in --json):

mean_projected_cost_usd, std_dev_cost_usd
p10_projected_cost_usd, p50_projected_cost_usd, p90_projected_cost_usd
p10/p50/p90_projected_effective_tokens

Console table now shows P50 as the primary estimate plus a new "80% CI (P10–P90)" column, replacing the single point estimate.

The Poisson sampler uses Knuth's exact algorithm for λ ≤ 15 and a Normal approximation for larger λ to keep 10k-trial simulation time well under 10ms. 14 unit tests cover the sampler, percentile helpers, ordering invariants, and order-of-magnitude accuracy.

pelikhan · 2026-05-10T22:30:12Z

@copilot focus on effective token forecast, remove cost forecasts

… projections Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan · 2026-05-11T00:03:53Z

@copilot review all comments

pelikhan · 2026-05-11T00:12:55Z

@copilot as an expert in industrial and market forecasting, review the algorithm used here

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-11T00:16:31Z

@copilot review all comments

All reviewer comments addressed across two commits:

Code fixes (forecast.go):

ET observations now loaded from cached run_summary.json files (written by gh aw logs) — fixes zero-ET forecasts without downloading artifacts
Duration computed from UpdatedAt − StartedAt (gh run list never sets the Duration field)
Sort order uses Monte Carlo P50 when available, falling back to point estimate
formatForecastPercent now takes a hasData bool — 0% is shown when all runs failed, N/A only when no runs were sampled
Per-workflow failures emit a warning and continue; remaining workflows are still forecasted

Help text (forecast_command.go):

Removed "cost" and inaccurate trigger-frequency/concurrency claims; description now accurately describes what the command does

Spec (forecast-specification.md):

--days permitted values updated to {7, 30} (removed 90)
Yield redefined as throughput rate (success_rate × observed_runs_per_period)
ET retrieval section updated to describe cached run summaries instead of artifact downloads
Duration formula corrected to updated_at − started_at
Monte Carlo Bernoulli section now references success_rate not yield
Console table columns updated to match actual implementation (Triggers column added, K/M formatting documented)

Tests (forecast_test.go):

Added //go:build !integration tag, testify assertions, and 12 tests covering format helpers, validation, and Duration enrichment

…able flag Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

…e, and footer text Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-11T00:29:16Z

@copilot as an expert in industrial and market forecasting, review the algorithm used here

Done in feat: upgrade Monte Carlo to Gamma–Poisson compound model with IsReliable flag.

Algorithm review findings and changes:

The original model treated the Poisson arrival rate λ as a known constant equal to the observed rate. This is the primary weakness in industrial/operational forecasting: with a short history window (e.g., 7 days, 5 runs), the rate estimate itself carries significant uncertainty, causing the CI to be artificially narrow.

Key improvement — Gamma–Poisson compound model (Negative Binomial):
Instead of numRuns ~ Poisson(λ_observed) per trial, the arrival rate is now drawn from its Bayesian posterior each trial:

λ_trial ~ Gamma(n + 0.5, scale = observedRunsPerPeriod / n)
numRuns ~ Poisson(λ_trial)

where n is the observed run count and 0.5 is the Jeffreys non-informative prior shape. This Gamma–Poisson compound is equivalent to the Negative Binomial — the standard model for overdispersed count data in actuarial science and industrial reliability. It naturally widens the CI for sparse samples and converges to the classical Poisson for large n, with no function signature changes required.

A new TestRunMonteCarloGammaPoissonWiderCI test directly verifies that 3-run histories produce wider CIs than 100-run histories at the same observed rate.

Additional improvements:

IsReliable flag on ForecastMonteCarloSummary (set when n ≥ 5); low-data workflows are marked * in the console table with an explanatory footnote
gammaSample (Marsaglia-Tsang squeeze method) as the new sampling primitive
Footer updated to explain what the model does in plain language

pelikhan · 2026-05-11T01:08:12Z

@copilot merge main and recompile

…cli-command Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-11T01:14:24Z

@copilot merge main and recompile

Done. Merged origin/main (b0dbcb7 — rate-limit rename + firewall bump) into the branch. go build ./... and all forecast tests pass cleanly after the merge.

pelikhan · 2026-05-11T01:20:34Z

@copilot add a flag --eval that does forecasting on past events (start forecast from a past date to be able to compare to measured data) and evaluates the quality of the forecast.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI and others added 2 commits May 10, 2026 21:01

Add forecast command for projecting token usage and costs

1f87e0e

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Fix incorrect comment in extractConcurrencyLimit

f0ea10b

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan May 10, 2026

Copilot AI changed the title ~~Add forecast command for projecting workflow token usage and costs~~ feat: add gh aw forecast command for projecting workflow token usage and costs May 10, 2026

Copilot finished work on behalf of pelikhan May 10, 2026 21:04

Copilot created this pull request from a session on behalf of pelikhan May 10, 2026 21:04 View session

Copilot AI requested a review from pelikhan May 10, 2026 21:04

github-actions Bot mentioned this pull request May 10, 2026

[Contribution Check Report] Contribution Check — 2026-05-10 #31284

Closed

Copilot started work on behalf of pelikhan May 10, 2026 21:27 View session

feat: wire --repo flag through workflow discovery in forecast command

28d6d1c

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 10, 2026 21:36

Copilot started work on behalf of pelikhan May 10, 2026 21:44 View session

feat: integrate episode analysis into forecast command

c09b8e6

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 10, 2026 21:54

Copilot started work on behalf of pelikhan May 10, 2026 21:54 View session

Copilot AI and others added 2 commits May 10, 2026 22:05

feat: add Monte Carlo simulation to forecast command (Poisson + boots…

cc65d5b

…trap + Bernoulli CI) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

fix: address code review feedback on Monte Carlo forecast implementation

7891bc1

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 10, 2026 22:08

Copilot started work on behalf of pelikhan May 10, 2026 22:30 View session

feat: remove cost forecasts, focus forecast output on effective token…

4c8c65e

… projections Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot started work on behalf of pelikhan May 11, 2026 00:04 View session

Copilot AI and others added 2 commits May 11, 2026 00:14

fix: address all reviewer comments on forecast command

022be18

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

fix: apply code review feedback on forecast tests and comments

3a52f21

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 11, 2026 00:17

Copilot started work on behalf of pelikhan May 11, 2026 00:17 View session

Copilot AI and others added 2 commits May 11, 2026 00:27

feat: upgrade Monte Carlo to Gamma–Poisson compound model with IsReli…

52eff75

…able flag Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

refine: address code review feedback on gammaSample doc, test coverag…

30c4aeb

…e, and footer text Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 11, 2026 00:30

Copilot started work on behalf of pelikhan May 11, 2026 01:08 View session

Merge remote-tracking branch 'origin/main' into copilot/add-forecast-…

ff07ec3

…cli-command Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 11, 2026 01:15

Copilot started work on behalf of pelikhan May 11, 2026 01:20 View session

Copilot AI and others added 2 commits May 11, 2026 01:28

feat: implement --eval backtesting mode for forecast command

28f4717

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

fix: address code review issues in --eval backtesting implementation

e853c1e

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 11, 2026 01:31

github-actions Bot mentioned this pull request May 11, 2026

[Contribution Check Report] Contribution Check — 2026-05-11 #31397

Closed

pelikhan merged commit a5f69a3 into main May 11, 2026

pelikhan deleted the copilot/add-forecast-cli-command branch May 11, 2026 02:04

Conversation

Copilot AI commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Feature

What does this feature do?

Implementation details

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

pelikhan commented May 10, 2026

Uh oh!

pelikhan commented May 10, 2026

Uh oh!

Copilot AI commented May 10, 2026

Uh oh!

pelikhan commented May 10, 2026

Uh oh!

pelikhan commented May 10, 2026

Uh oh!

Copilot AI commented May 10, 2026

Uh oh!

Copilot AI commented May 10, 2026

Uh oh!

pelikhan commented May 10, 2026

Uh oh!

pelikhan commented May 11, 2026

Uh oh!

pelikhan commented May 11, 2026

Uh oh!

Copilot AI commented May 11, 2026

Uh oh!

Copilot AI commented May 11, 2026

Uh oh!

pelikhan commented May 11, 2026

Uh oh!

Copilot AI commented May 11, 2026

Uh oh!

pelikhan commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 10, 2026 •

edited

Loading