You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Daily NLP-based clustering analysis of copilot agent task prompts using machine learning to identify patterns, success factors, and optimization opportunities.
Executive Summary
Analyzed 971 copilot agent tasks from the last 30 days using TF-IDF vectorization and K-means clustering to identify common task patterns and success factors.
"Rename logger variables to log in all 5 Go files - pkg/cli/trial_command.go, pkg/workflow/claude_engine.go, pkg/workflow/expressions.go, pkg/workflow/js.go, pkg/workflow/mcp-config.go..."
Cluster 3: Agentic-related Tasks
14.7% of all tasks | 78.3% success rate
This cluster involves creating and managing agentic workflows, automation tasks, and high-complexity workflow implementations.
Characteristics:
Complexity: High impact (8 files, but 1,632 lines on average - 3x higher than other clusters)
Why So Many Lines?
Agentic workflow files contain extensive instructions, prompts, and documentation, resulting in large file additions even though they touch fewer files.
Example Successful Tasks:
#2100 ✅ Spread scheduled agentic workflows across 24 hours
"Review the scheduled agentic workflows and spread them the entire day. Schedule the smoke workflows every 6h"
Success Rate Comparison
Cluster
Theme
Tasks
Success Rate
Avg Complexity
2
Workflow-related
331
78.9% 🥇
10.5 files, 397 lines
3
Agentic-related
143
78.3% 🥈
8.0 files, 1632 lines
1
Feature Implementation
497
76.1% 🥉
18.0 files, 568 lines
All clusters show strong performance (>75% success rate), indicating the copilot agent is effective across diverse task types.
Statistical Insights
Complexity vs Success Correlation
Correlation with Merge Success:
Files Changed: -0.04 (nearly neutral)
Lines Added: -0.11 (weak negative)
Interpretation: While larger changes show a slight tendency toward lower success rates, the effect is minimal. This suggests the agent handles both simple and complex tasks reasonably well.
Review Engagement Patterns
Average Review Comments per PR by Cluster:
Agentic-related: 1.9 comments (highest scrutiny)
Feature Implementation: 1.7 comments
Workflow-related: 1.0 comments (lowest)
The workflow-related cluster achieves the highest success rate with the lowest review overhead, suggesting tasks are well-specified and implementations are straightforward.
Key Findings
1. High Success Across All Task Types ✅
All 3 clusters achieve ≥75% success rates with an average of 78%. This indicates:
The copilot agent is versatile and handles diverse task types well
Current prompt engineering and agent capabilities are effective
Task scoping is generally appropriate
2. Workflow-Related Tasks Perform Best 🎯
The workflow-related cluster shows:
Highest success rate (78.9%)
Lowest complexity (10.5 files, 397 lines)
Least review overhead (1.0 comments per PR)
Why? These tasks are typically:
Well-scoped and focused
Less ambiguous in requirements
Often involve refactoring or targeted fixes
Well-aligned with agent capabilities
3. Complexity Has Minimal Impact on Success 📊
Correlation analysis shows:
Files changed: -0.04 (neutral)
Lines added: -0.11 (weak negative)
Despite the Agentic-related cluster having 3x more lines changed on average (1,632 vs ~500), it maintains a strong 78.3% success rate. This suggests:
The agent handles complexity reasonably well
Large line counts (from workflow files with instructions) don't significantly impede success
Breaking tasks into smaller PRs may help slightly but isn't critical
4. Task Distribution Shows Clear Use Patterns 📈
Primary Use Cases:
Feature Implementation (51%) - Most common use case
Workflow Development (34%) - Internal tooling focus
This distribution reflects the gh-aw project's focus: building tooling (51%), maintaining the tool itself (34%), and creating automation workflows (15%).
Recommendations
1. Leverage High-Success Patterns 🎯
Action: Create prompt templates and best practices for workflow-related tasks.
Why: The workflow-related cluster shows the highest success rate (79%) with minimal review overhead. These characteristics can guide prompt engineering for other task types.
Suggested Templates:
Refactoring tasks: "Refactor [pattern] in [files] to [goal]"
Bug fixes: "Fix [specific issue] in [file/function] by [approach]"
Code cleanup: "Extract/consolidate [duplicate code] from [locations]"
2. Continue Current Task Scoping Approach ✅
Action: No immediate changes needed to task scoping.
Why: The weak correlation between complexity and success (-0.11) indicates current task sizing is appropriate. Even high-complexity tasks (1,632 lines) achieve 78% success.
Maintain:
Current balance of simple and complex tasks
Flexibility in task sizing based on logical boundaries
Don't artificially split tasks that are naturally cohesive
3. Optimize Feature Implementation Tasks 📊
Action: Investigate why feature implementation tasks (51% of all tasks) have slightly lower success rates (76%).
Approach:
Review closed PRs in this cluster to identify common failure patterns
Consider whether feature tasks have more ambiguous requirements
Potentially add more context or examples to prompts
May need more iterative feedback during implementation
Hypothesis: Feature tasks may require more domain knowledge or have less clear specifications compared to refactoring/workflow tasks.
4. Document Success Patterns 📚
Action: Create a "Copilot Task Success Playbook" based on cluster analysis.
Content:
For workflow/refactoring tasks: Emphasize precision and scope clarity (highest success)
For feature tasks: Provide more context, examples, and acceptance criteria
For agentic tasks: Expect larger changes; focus on structure and documentation quality
Methodology
Data Collection
Source: 1,000 PRs created by copilot-swe-agent in last 30 days
Processed: 971 PRs with valid task descriptions (97.1% coverage)
Excluded: 29 PRs without meaningful prompts (<30 characters)
NLP Clustering
Vectorization: TF-IDF with 200 features, 1-3 grams
Algorithm: K-means clustering
Optimal Clusters: 3 (determined via elbow method)
Features: Removed stop words, min document frequency = 2
Monitor Trends - Run this analysis weekly to track improvements over time
Experiment with Prompt Engineering - Test whether more structured prompts improve feature task success rates
Analysis methodology: TF-IDF vectorization + K-means clustering on 971 copilot agent task prompts from the last 30 days. Full report and visualizations available in workflow artifacts.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
🔬 Copilot Agent Prompt Clustering Analysis
Daily NLP-based clustering analysis of copilot agent task prompts using machine learning to identify patterns, success factors, and optimization opportunities.
Executive Summary
Analyzed 971 copilot agent tasks from the last 30 days using TF-IDF vectorization and K-means clustering to identify common task patterns and success factors.
Key Metrics:
Quick Insights
📊 Most Common Task Type: Feature Implementation & Enhancement
🎯 Highest Success Rate: Workflow-related Tasks
⚙️ Most Complex Tasks: Agentic-related Tasks
Full Analysis Report
Cluster Breakdown
Cluster 1: Feature Implementation & Enhancement
51.2% of all tasks | 76.1% success rate
This is the dominant cluster, representing general feature additions, command updates, and enhancements to the gh-aw CLI tool and agent capabilities.
Characteristics:
Complexity Metrics:
Example Successful Tasks:
Sample Task Prompt:
Cluster 2: Workflow-related Tasks
34.1% of all tasks | 78.9% success rate 🏆
This cluster focuses on internal gh-aw tool development, package refactoring, and workflow compilation improvements.
Characteristics:
Complexity Metrics:
Success Factors:
This cluster has the highest success rate (79%) with the lowest review overhead (1.0 comments per PR), suggesting these tasks are:
Example Successful Tasks:
Sample Task Prompt:
Cluster 3: Agentic-related Tasks
14.7% of all tasks | 78.3% success rate
This cluster involves creating and managing agentic workflows, automation tasks, and high-complexity workflow implementations.
Characteristics:
Complexity Metrics:
Why So Many Lines?
Agentic workflow files contain extensive instructions, prompts, and documentation, resulting in large file additions even though they touch fewer files.
Example Successful Tasks:
Sample Task Prompt:
Success Rate Comparison
All clusters show strong performance (>75% success rate), indicating the copilot agent is effective across diverse task types.
Statistical Insights
Complexity vs Success Correlation
Correlation with Merge Success:
Interpretation: While larger changes show a slight tendency toward lower success rates, the effect is minimal. This suggests the agent handles both simple and complex tasks reasonably well.
Review Engagement Patterns
Average Review Comments per PR by Cluster:
The workflow-related cluster achieves the highest success rate with the lowest review overhead, suggesting tasks are well-specified and implementations are straightforward.
Key Findings
1. High Success Across All Task Types ✅
All 3 clusters achieve ≥75% success rates with an average of 78%. This indicates:
2. Workflow-Related Tasks Perform Best 🎯
The workflow-related cluster shows:
Why? These tasks are typically:
3. Complexity Has Minimal Impact on Success 📊
Correlation analysis shows:
Despite the Agentic-related cluster having 3x more lines changed on average (1,632 vs ~500), it maintains a strong 78.3% success rate. This suggests:
4. Task Distribution Shows Clear Use Patterns 📈
Primary Use Cases:
This distribution reflects the gh-aw project's focus: building tooling (51%), maintaining the tool itself (34%), and creating automation workflows (15%).
Recommendations
1. Leverage High-Success Patterns 🎯
Action: Create prompt templates and best practices for workflow-related tasks.
Why: The workflow-related cluster shows the highest success rate (79%) with minimal review overhead. These characteristics can guide prompt engineering for other task types.
Suggested Templates:
2. Continue Current Task Scoping Approach ✅
Action: No immediate changes needed to task scoping.
Why: The weak correlation between complexity and success (-0.11) indicates current task sizing is appropriate. Even high-complexity tasks (1,632 lines) achieve 78% success.
Maintain:
3. Optimize Feature Implementation Tasks 📊
Action: Investigate why feature implementation tasks (51% of all tasks) have slightly lower success rates (76%).
Approach:
Hypothesis: Feature tasks may require more domain knowledge or have less clear specifications compared to refactoring/workflow tasks.
4. Document Success Patterns 📚
Action: Create a "Copilot Task Success Playbook" based on cluster analysis.
Content:
Methodology
Data Collection
NLP Clustering
Metrics Tracked
Data Table: Representative PRs by Cluster
Cluster 1: Feature Implementation & Enhancement (Top 10)
Cluster 2: Workflow-related Tasks (Top 10)
Cluster 3: Agentic-related Tasks (Top 10)
Visualizations
The analysis generated two key visualizations:
Cluster Overview (
cluster-overview.png):Complexity Metrics (
cluster-complexity.png):Charts available in workflow artifacts for detailed review.
Next Steps
Analysis methodology: TF-IDF vectorization + K-means clustering on 971 copilot agent task prompts from the last 30 days. Full report and visualizations available in workflow artifacts.
Beta Was this translation helpful? Give feedback.
All reactions