[prompt-clustering] Copilot Agent Prompt Clustering — 2026-06-05 #37098

2026-06-05T11:18:31Z

github-actions[bot]
Bot Jun 5, 2026

Summary

Analysis Period: Last 30 days (2026-05-17 → 2026-06-05)
Total Tasks Analyzed: 999 copilot-authored PRs
Clusters Identified: 8 (KMeans, silhouette 0.0419)
Overall Success Rate: 75.4% merged
Avg Iterations: 4.38 commits/PR

Eight stable themes emerged. The two largest — shared-helper refactors and safe-output/schema work — account for 59% of all tasks. Success skews high overall, but two clusters drag below the mean: Codex/AWF generated config & defaults (66%) and the Copilot SDK-driver work (68%).

Key Findings

Refactor + safe-output work dominates. Clusters C6 and C2 together are 586 of 999 PRs and merge near the mean (~75–76%) — the reliable bread-and-butter of the agent fleet.
Generated/Codex config (C7) is the weak spot. At 66% merge with the largest blast radius (avg 78 files changed, ~1054/780 add/del), large auto-generated diffs are the least likely to land.
Sous-chef tasks (C4) succeed but grind. Highest merge rate (85%) yet by far the most iterations (9.09 commits/PR) — they get there, but slowly.
Small, well-scoped fixes win. "Fix failing Actions job" (C1) merges at 81% in only 3.11 commits — the tightest scope, the cleanest outcome.
Trend: success dipped this period. Overall merge rate 75.4% vs 80.3% (2026-06-02) and 78.8% earlier — worth watching whether the SDK/Codex clusters are pulling the average down.

Success Rate by Cluster

Cluster	Theme	Tasks	%	Success	Avg Commits	Avg Files
C6	Shared helpers & error/path refactors	344	34%	75%	4.02	32
C2	Safe-outputs & schema validation	242	24%	76%	4.31	13
C3	Prompts, skills & experiments	131	13%	80%	3.66	11
C7	Codex/AWF generated config & defaults	126	13%	66%	4.19	78
C4	Sous-chef multi-agent (GPT-mini) tasks	54	5%	85%	9.09	35
C5	Model alias/multiplier plumbing	41	4%	76%	4.02	49
C0	Copilot SDK driver/harness mode	34	3%	68%	5.94	33
C1	Fixing failing GitHub Actions jobs	27	3%	81%	3.11	23

Detailed cluster breakdown

C1: Fixing failing GitHub Actions jobs

Size: 27 tasks (2.7%)
Outcome: 22 merged / 5 closed / 0 open → 81% success
Effort: 3.11 commits, 0.89 reviews, 0.19 comments, 23 files (+49/-41)
Top terms: actions, failing github, actions job, job, github actions, failing
Example PRs: [WIP] Fix failing GitHub Actions job js-typecheck #32839, [WIP] Fix failing GitHub Actions job agent #34119, [WIP] Fix failing GitHub Actions job lint-go #33548, [WIP] Fix failing GitHub Actions job 'agent' #34639

Representative data table (2 highest-iteration PRs per cluster)

PR #	Title	Cluster	Outcome	Commits	Files
#33033	Add `github-app.missing-key` ignore mode and guard App token	C6	Merged	15	16
#33129	Fix compound `\|\|` expressions in prompt markdown never sub	C6	Merged	14	6
#33350	feat(safe-outputs): add required-labels/required-title-prefi	C2	Merged	28	70
#33852	Add `create-check-run` safe output type for multi-agent PR a	C2	Merged	17	29
#34874	Add inline skill extraction/runtime support mirroring inline	C3	Merged	18	29
#35773	Update `gh aw init` to create the Agentic Workflows custom a	C3	Merged	17	10
#35286	Centralize compiler enterprise env controls, expand GH_AW_DE	C7	Merged	19	244
#35802	[awf] Fix tool-cache mount handling, smoke-pi runtime config	C7	Merged	17	254
#33273	Add `on.pull_request_reviewer: slash_command` synthetic trig	C4	Merged	39	257
#36676	Compat-based Copilot CLI install: single remote fetch, jq-on	C4	Open	32	7
#34837	Move model alias/multiplier propagation from step env to act	C5	Merged	18	258
#36634	Tighten ET computation details layout and compact model alia	C5	Merged	13	8
#36358	Fix copilot-sdk harness stdin wiring, SDK installation/resol	C0	Merged	19	247
#36538	Refine Copilot SDK-mode tool permission scoping from engine	C0	Merged	19	68
#32840	[WIP] Fix failing GitHub Actions job JS Tests (shard 4/4)	C1	Merged	7	2
#33165	[WIP] Fix failing GitHub Actions job JS Tests (shard 4/4)	C1	Merged	6	3

Recommendations

Tighten Codex/AWF-generated config tasks (C7). The lowest merge rate pairs with the biggest diffs. Split large generated changes into reviewable chunks, or add a pre-merge diff-size guard so reviewers aren't handed 78-file PRs.
Investigate Copilot SDK-driver failures (C0). 68% merge over 5.9 commits suggests the harness/SDK mode is still flaky — a good candidate for a focused reliability pass.
Cap iteration churn on sous-chef tasks (C4). They land but average 9 commits; clearer up-front task specs or a turn budget could cut the back-and-forth.
Keep leaning on tightly-scoped fix prompts (C1). Cheapest and among the most reliable — the pattern to replicate when phrasing new tasks.

Methodology & limitations

TF-IDF (1–2 grams, domain stop-words removed) over title+body of 999 cleaned PR descriptions; KMeans with k chosen by silhouette across k=4–8.
Firewall/warning boilerplate, code blocks, and URLs stripped before vectorizing.
Iterations use commit count as a proxy: the gh-aw workflow-run logs (true turn counts/cost) were not fetched this run, so turn/cost metrics are approximated by commits. Silhouette is low (0.04), expected for short, overlapping technical text — clusters are directional, not hard partitions.
History persisted to cache for trend tracking across runs.

References: §27010769683

Generated by 📊 Copilot Agent Prompt Clustering Analysis · 151 AIC · ◷

expires on Jun 6, 2026, 11:18 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering — 2026-06-05 #37098

Uh oh!

{{title}}

Uh oh!

C6: Shared helpers & error/path refactors

C2: Safe-outputs & schema validation

C3: Prompts, skills & experiments

C7: Codex/AWF generated config & defaults

C4: Sous-chef multi-agent (GPT-mini) tasks

C5: Model alias/multiplier plumbing

C0: Copilot SDK driver/harness mode

C1: Fixing failing GitHub Actions jobs

Replies: 0 comments

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering — 2026-06-05 #37098

Uh oh!

github-actions[bot] Bot Jun 5, 2026

Summary

Key Findings

Success Rate by Cluster

C6: Shared helpers & error/path refactors

C2: Safe-outputs & schema validation

C3: Prompts, skills & experiments

C7: Codex/AWF generated config & defaults

C4: Sous-chef multi-agent (GPT-mini) tasks

C5: Model alias/multiplier plumbing

C0: Copilot SDK driver/harness mode

C1: Fixing failing GitHub Actions jobs

Recommendations

Replies: 0 comments

github-actions[bot]
Bot Jun 5, 2026