-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Problem
The Metrics Collector workflow infrastructure is not producing expected output, preventing all meta-orchestrators from performing health analysis.
Missing Data
- Latest metrics file not found:
/tmp/gh-aw/repo-memory-default/memory/meta-orchestrators/metrics/latest.json - Historical metrics unavailable:
/tmp/gh-aw/repo-memory-default/memory/meta-orchestrators/metrics/daily/*.json - Repo memory access denied: Permission issues accessing shared memory paths
Impact
All meta-orchestrators affected:
- ❌ Workflow Health Manager - Cannot assess workflow success rates or detect failures
- ❌ Agent Performance Analyzer - Cannot analyze agent quality trends
- ❌ Campaign Manager - Cannot track campaign health metrics
- ❌ Other workflows depending on shared metrics infrastructure
Without metrics data, we cannot:
- Detect failing workflows proactively
- Calculate success rates or MTBF
- Identify error patterns
- Track performance trends
- Make data-driven optimization decisions
Root Cause Analysis Needed
Possible Issues
-
Metrics Collector workflow failing
- Not running on schedule (daily)
- Encountering errors during execution
- Timeout or resource constraints
-
Repo memory configuration
- Branch
memory/meta-orchestratorsnot accessible - Permission issues on repo-memory tool
- File path or glob pattern misconfiguration
- Branch
-
File system permissions
/tmp/gh-aw/repo-memory-default/permissions incorrect- Memory mount not working in workflow environment
Investigation Steps
-
Check Metrics Collector status
gh run list --workflow=metrics-collector.md --limit 10
-
Review recent run logs
gh run view (run-id) --log
-
Verify repo-memory branch
git ls-remote origin memory/meta-orchestrators
-
Test repo-memory access
- Run simple workflow that writes to repo-memory
- Verify files are committed to branch
Expected Metrics Format
The Metrics Collector should produce:
latest.json:
{
"timestamp": "2025-12-26T00:00:00Z",
"workflows": {
"workflow-name": {
"total_runs": 10,
"successful_runs": 8,
"failed_runs": 2,
"success_rate": 0.80,
"avg_duration_seconds": 120
}
}
}daily/YYYY-MM-DD.json: Same format, one per day for 30 days
Recommended Fix
- Verify Metrics Collector workflow is running successfully
- Fix repo-memory permissions if access is blocked
- Update metrics collection if format changed
- Document metrics schema for consistency across meta-orchestrators
Priority Justification
P0 (Critical) because:
- Blocks all meta-orchestrator health monitoring
- Prevents proactive failure detection across 124 workflows
- No workaround available - metrics are foundation for health assessment
- Affects entire agentic workflow ecosystem reliability
Success Criteria
✅ Metrics Collector runs successfully on daily schedule
✅ latest.json appears in expected location
✅ Historical daily metrics available for 30-day analysis
✅ Workflow Health Manager can access and parse metrics
✅ All meta-orchestrators resume normal operation
Discovered by: Workflow Health Manager
Run ID: 20514768306
Date: 2025-12-26 02:53 UTC
AI generated by Workflow Health Manager - Meta-Orchestrator