Copilot Agent Prompt Clustering Analysis - 2025-12-27 #7872

2025-12-27T06:41:27Z

github-actions[bot]
bot Dec 27, 2025

🔬 Copilot Agent Prompt Clustering Analysis

Daily NLP-based clustering analysis of copilot agent task prompts using machine learning to identify patterns, success factors, and optimization opportunities.

Executive Summary

Analyzed 971 copilot agent tasks from the last 30 days using TF-IDF vectorization and K-means clustering to identify common task patterns and success factors.

Key Metrics:

Overall Success Rate: 77.3% (751 merged, 220 closed)
Clusters Identified: 3 distinct task categories
Analysis Coverage: 97.1% of all copilot PRs (971/1000)

Quick Insights

📊 Most Common Task Type: Feature Implementation & Enhancement

497 tasks (51% of total)
76% success rate
Average complexity: 18 files, 568 lines changed

🎯 Highest Success Rate: Workflow-related Tasks

331 tasks (34% of total)
79% success rate ⭐
Lower complexity: 10.5 files, 397 lines changed

⚙️ Most Complex Tasks: Agentic-related Tasks

143 tasks (15% of total)
78% success rate
1,632 avg lines changed (3x higher than other clusters)

Full Analysis Report

Cluster Breakdown

Cluster 1: Feature Implementation & Enhancement

51.2% of all tasks | 76.1% success rate

This is the dominant cluster, representing general feature additions, command updates, and enhancements to the gh-aw CLI tool and agent capabilities.

Characteristics:

Complexity: Medium-high (18 files, 568 lines on average)
Engagement: 3.9 commits, 1.7 review comments per PR
Top Keywords: update, add, command, agent, version, github, cli, copilot

Complexity Metrics:

Avg Files Changed: 18.0
Avg Lines Added: 568
Avg Commits: 3.9
Avg Review Comments: 1.7

Example Successful Tasks:

#2097 ✅ Add minimal path format syntax reference to imports documentation
#2099 ✅ Add directory creation for copilot engine --add-dir paths
#2102 ✅ Add workflow status badges documentation page
#2104 ✅ Add edit tool to commit-changes-analyzer workflow
#2108 ✅ Remove js-yaml dependency from badge generator
#2119 ✅ Consolidate permission parsing logic

Sample Task Prompt:

"Update the frontmatter imports documentation under docs with all the supported URL and path formats"

Cluster 2: Workflow-related Tasks

34.1% of all tasks | 78.9% success rate 🏆

This cluster focuses on internal gh-aw tool development, package refactoring, and workflow compilation improvements.

Characteristics:

Complexity: Low-medium (10.5 files, 397 lines on average)
Engagement: 3.3 commits, 1.0 review comments per PR
Top Keywords: workflow, pkg, gh, aw, gh-aw, githubnext/gh-aw

Complexity Metrics:

Avg Files Changed: 10.5
Avg Lines Added: 397
Avg Commits: 3.3
Avg Review Comments: 1.0 (lowest among clusters)

Success Factors:
This cluster has the highest success rate (79%) with the lowest review overhead (1.0 comments per PR), suggesting these tasks are:

Well-scoped and focused
Less ambiguous in requirements
Well-suited to the agent's capabilities

Example Successful Tasks:

#2107 ✅ Update variable name to 'log' and modify instructions
#2171 ✅ Refactor duplicate MCP code patterns
#2193 ✅ Fix heredoc delimiter collision in workflow compilation
#2228 ✅ Fix audit command to cache downloads
#2249 ✅ Refactor: Extract duplicate GitHub MCP remote config rendering

Sample Task Prompt:

"Rename logger variables to log in all 5 Go files - pkg/cli/trial_command.go, pkg/workflow/claude_engine.go, pkg/workflow/expressions.go, pkg/workflow/js.go, pkg/workflow/mcp-config.go..."

Cluster 3: Agentic-related Tasks

14.7% of all tasks | 78.3% success rate

This cluster involves creating and managing agentic workflows, automation tasks, and high-complexity workflow implementations.

Characteristics:

Complexity: High impact (8 files, but 1,632 lines on average - 3x higher than other clusters)
Engagement: 3.6 commits, 1.9 review comments per PR (highest review engagement)
Top Keywords: agentic, agentic workflow, workflow, workflows, agent, create, update, daily

Complexity Metrics:

Avg Files Changed: 8.0 (fewest files)
Avg Lines Added: 1,632 (highest by far)
Avg Commits: 3.6
Avg Review Comments: 1.9 (highest engagement)

Why So Many Lines?
Agentic workflow files contain extensive instructions, prompts, and documentation, resulting in large file additions even though they touch fewer files.

Example Successful Tasks:

#2100 ✅ Spread scheduled agentic workflows across 24 hours
#2103 ✅ Add smoke-outpost workflow (4,250 lines)
#2109 ✅ Add semantic function refactoring workflow (4,351 lines)
#2115 ✅ Add scheduling best practices guidance
#2134 ✅ Add reporting instructions for HTML details/summary
#2147 ✅ Add dictation prompt generator workflow (4,629 lines)

Sample Task Prompt:

"Review the scheduled agentic workflows and spread them the entire day. Schedule the smoke workflows every 6h"

Success Rate Comparison

Cluster	Theme	Tasks	Success Rate	Avg Complexity
2	Workflow-related	331	78.9% 🥇	10.5 files, 397 lines
3	Agentic-related	143	78.3% 🥈	8.0 files, 1632 lines
1	Feature Implementation	497	76.1% 🥉	18.0 files, 568 lines

All clusters show strong performance (>75% success rate), indicating the copilot agent is effective across diverse task types.

Statistical Insights

Complexity vs Success Correlation

Correlation with Merge Success:

Files Changed: -0.04 (nearly neutral)
Lines Added: -0.11 (weak negative)

Interpretation: While larger changes show a slight tendency toward lower success rates, the effect is minimal. This suggests the agent handles both simple and complex tasks reasonably well.

Review Engagement Patterns

Average Review Comments per PR by Cluster:

Agentic-related: 1.9 comments (highest scrutiny)
Feature Implementation: 1.7 comments
Workflow-related: 1.0 comments (lowest)

The workflow-related cluster achieves the highest success rate with the lowest review overhead, suggesting tasks are well-specified and implementations are straightforward.

Key Findings

1. High Success Across All Task Types ✅

All 3 clusters achieve ≥75% success rates with an average of 78%. This indicates:

The copilot agent is versatile and handles diverse task types well
Current prompt engineering and agent capabilities are effective
Task scoping is generally appropriate

2. Workflow-Related Tasks Perform Best 🎯

The workflow-related cluster shows:

Highest success rate (78.9%)
Lowest complexity (10.5 files, 397 lines)
Least review overhead (1.0 comments per PR)

Why? These tasks are typically:

Well-scoped and focused
Less ambiguous in requirements
Often involve refactoring or targeted fixes
Well-aligned with agent capabilities

3. Complexity Has Minimal Impact on Success 📊

Correlation analysis shows:

Files changed: -0.04 (neutral)
Lines added: -0.11 (weak negative)

Despite the Agentic-related cluster having 3x more lines changed on average (1,632 vs ~500), it maintains a strong 78.3% success rate. This suggests:

The agent handles complexity reasonably well
Large line counts (from workflow files with instructions) don't significantly impede success
Breaking tasks into smaller PRs may help slightly but isn't critical

4. Task Distribution Shows Clear Use Patterns 📈

Primary Use Cases:

Feature Implementation (51%) - Most common use case
Workflow Development (34%) - Internal tooling focus
Agentic Workflows (15%) - Specialized automation tasks

This distribution reflects the gh-aw project's focus: building tooling (51%), maintaining the tool itself (34%), and creating automation workflows (15%).

Recommendations

1. Leverage High-Success Patterns 🎯

Action: Create prompt templates and best practices for workflow-related tasks.

Why: The workflow-related cluster shows the highest success rate (79%) with minimal review overhead. These characteristics can guide prompt engineering for other task types.

Suggested Templates:

Refactoring tasks: "Refactor [pattern] in [files] to [goal]"
Bug fixes: "Fix [specific issue] in [file/function] by [approach]"
Code cleanup: "Extract/consolidate [duplicate code] from [locations]"

2. Continue Current Task Scoping Approach ✅

Action: No immediate changes needed to task scoping.

Why: The weak correlation between complexity and success (-0.11) indicates current task sizing is appropriate. Even high-complexity tasks (1,632 lines) achieve 78% success.

Maintain:

Current balance of simple and complex tasks
Flexibility in task sizing based on logical boundaries
Don't artificially split tasks that are naturally cohesive

3. Optimize Feature Implementation Tasks 📊

Action: Investigate why feature implementation tasks (51% of all tasks) have slightly lower success rates (76%).

Approach:

Review closed PRs in this cluster to identify common failure patterns
Consider whether feature tasks have more ambiguous requirements
Potentially add more context or examples to prompts
May need more iterative feedback during implementation

Hypothesis: Feature tasks may require more domain knowledge or have less clear specifications compared to refactoring/workflow tasks.

4. Document Success Patterns 📚

Action: Create a "Copilot Task Success Playbook" based on cluster analysis.

Content:

For workflow/refactoring tasks: Emphasize precision and scope clarity (highest success)
For feature tasks: Provide more context, examples, and acceptance criteria
For agentic tasks: Expect larger changes; focus on structure and documentation quality

Methodology

Data Collection

Source: 1,000 PRs created by copilot-swe-agent in last 30 days
Processed: 971 PRs with valid task descriptions (97.1% coverage)
Excluded: 29 PRs without meaningful prompts (<30 characters)

NLP Clustering

Vectorization: TF-IDF with 200 features, 1-3 grams
Algorithm: K-means clustering
Optimal Clusters: 3 (determined via elbow method)
Features: Removed stop words, min document frequency = 2

Metrics Tracked

Success outcome (merged vs closed)
Complexity (files changed, lines added)
Engagement (commits, reviews, comments)
Task characteristics (keywords, themes)

Data Table: Representative PRs by Cluster

Cluster 1: Feature Implementation & Enhancement (Top 10)

PR #	Title	Status	Files	Lines	Keywords
#2097	Add minimal path format syntax reference to imports document...	✅	1	+20	update, add, command
#2099	Add directory creation for copilot engine --add-dir paths	✅	25	+335	update, add, command
#2102	Add workflow status badges documentation page	✅	6	+195	update, add, command
#2104	Add edit tool to commit-changes-analyzer workflow	✅	2	+21	update, add, command
#2108	Remove js-yaml dependency from badge generator, use regex-ba...	✅	2	+16	update, add, command
#2112	Restore AI instructions in README.md wrapped in XML comments	✅	1	+7	update, add, command
#2113	Move workflow status documentation page to top level	✅	3	+11	update, add, command
#2119	Consolidate permission parsing logic and add support for "al...	✅	5	+231	update, add, command
#2101	[WIP] Migrate JavaScript memory server to Wasm component	❌	0	+0	update, add, command
#2111	Refactor permissions parsing from compiler.go to permissions...	❌	34	+275	update, add, command

Cluster 2: Workflow-related Tasks (Top 10)

PR #	Title	Status	Files	Lines	Keywords
#2107	[WIP] Update variable name to 'log' and modify instructions	✅	7	+155	workflow, pkg, gh
#2171	Refactor duplicate MCP code patterns for improved maintainab...	✅	10	+570	workflow, pkg, gh
#2193	Fix heredoc delimiter collision causing workflow compilation...	✅	59	+625	workflow, pkg, gh
#2209	[WIP] Comment on issue #2157 regarding recurrence failure	✅	3	+62	workflow, pkg, gh
#2228	Fix audit command to cache downloads and prevent duplicate a...	✅	4	+133	workflow, pkg, gh
#2242	Remove .serena files and refactor isValidReaction to reactio...	✅	56	+76	workflow, pkg, gh
#2249	Refactor: Extract duplicate GitHub MCP remote config renderi...	✅	5	+342	workflow, pkg, gh
#2210	[WIP] Fix critical recurring failure in GenAIScript model	❌	0	+0	workflow, pkg, gh
#2236	Confirm gpt-4.1 invalid model issue already resolved - no ch...	❌	51	+51	workflow, pkg, gh
#2282	Extract 22 YAML generation functions from compiler.go to yam...	❌	4	+915	workflow, pkg, gh

Cluster 3: Agentic-related Tasks (Top 10)

PR #	Title	Status	Files	Lines	Keywords
#2100	Spread scheduled agentic workflows across 24 hours and add 6...	✅	30	+43	agentic, workflow
#2103	Add smoke-outpost workflow for investigating failed smoke te...	✅	2	+4250	agentic, workflow
#2109	Add semantic function refactoring workflow for Go code analy...	✅	3	+4351	agentic, workflow
#2115	Add scheduling best practices guidance for daily workflows	✅	1	+5	agentic, workflow
#2118	Merge documentation unbloat style guidance and add import to...	✅	3	+319	agentic, workflow
#2129	Rename smoke-outpost workflow to smoke-detector	✅	2	+12	agentic, workflow
#2134	Add reporting instructions for HTML details/summary formatti...	✅	17	+170	agentic, workflow
#2135	Update smoke genaiscript workflow to use openai:gpt-4.1 mode...	✅	2	+8	agentic, workflow
#2147	Add dictation prompt generator agentic workflow	✅	2	+4629	agentic, workflow
#2110	[WIP] Refactor clusters of functions in Go files	❌	0	+0	agentic, workflow

Visualizations

The analysis generated two key visualizations:

Cluster Overview (cluster-overview.png):
- Bar chart showing task counts per cluster
- Success rate comparison with color coding (green >75%, orange >50%, red <50%)
Complexity Metrics (cluster-complexity.png):
- 4-panel comparison showing:
  - Average files changed per cluster
  - Average lines added per cluster
  - Average commits per cluster
  - Average review comments per cluster

Charts available in workflow artifacts for detailed review.

Next Steps

Create Prompt Templates - Document best practices from workflow-related cluster
Investigate Feature Task Failures - Analyze closed PRs to identify improvement opportunities
Monitor Trends - Run this analysis weekly to track improvements over time
Experiment with Prompt Engineering - Test whether more structured prompts improve feature task success rates

Analysis methodology: TF-IDF vectorization + K-means clustering on 971 copilot agent task prompts from the last 30 days. Full report and visualizations available in workflow artifacts.

AI generated by Copilot Agent Prompt Clustering Analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Copilot Agent Prompt Clustering Analysis - 2025-12-27 #7872

Uh oh!

{{title}}

Uh oh!

Cluster Breakdown

Cluster 1: Feature Implementation & Enhancement

Cluster 2: Workflow-related Tasks

Cluster 3: Agentic-related Tasks

Success Rate Comparison

Statistical Insights

Complexity vs Success Correlation

Review Engagement Patterns

Key Findings

1. High Success Across All Task Types ✅

2. Workflow-Related Tasks Perform Best 🎯

3. Complexity Has Minimal Impact on Success 📊

4. Task Distribution Shows Clear Use Patterns 📈

Recommendations

1. Leverage High-Success Patterns 🎯

2. Continue Current Task Scoping Approach ✅

3. Optimize Feature Implementation Tasks 📊

4. Document Success Patterns 📚

Methodology

Data Collection

NLP Clustering

Metrics Tracked

Data Table: Representative PRs by Cluster

Cluster 1: Feature Implementation & Enhancement (Top 10)

Cluster 2: Workflow-related Tasks (Top 10)

Cluster 3: Agentic-related Tasks (Top 10)

Replies: 0 comments

Select a reply

Uh oh!

Copilot Agent Prompt Clustering Analysis - 2025-12-27 #7872

Uh oh!

github-actions[bot] bot Dec 27, 2025

🔬 Copilot Agent Prompt Clustering Analysis

Executive Summary

Quick Insights

Cluster Breakdown

Cluster 1: Feature Implementation & Enhancement

Cluster 2: Workflow-related Tasks

Cluster 3: Agentic-related Tasks

Success Rate Comparison

Statistical Insights

Complexity vs Success Correlation

Review Engagement Patterns

Key Findings

1. High Success Across All Task Types ✅

2. Workflow-Related Tasks Perform Best 🎯

3. Complexity Has Minimal Impact on Success 📊

4. Task Distribution Shows Clear Use Patterns 📈

Recommendations

1. Leverage High-Success Patterns 🎯

2. Continue Current Task Scoping Approach ✅

3. Optimize Feature Implementation Tasks 📊

4. Document Success Patterns 📚

Methodology

Data Collection

NLP Clustering

Metrics Tracked

Data Table: Representative PRs by Cluster

Cluster 1: Feature Implementation & Enhancement (Top 10)

Cluster 2: Workflow-related Tasks (Top 10)

Cluster 3: Agentic-related Tasks (Top 10)

Visualizations

Next Steps

Replies: 0 comments

github-actions[bot]
bot Dec 27, 2025