Problem
AgentOps has shipped the four-layer architecture across 73 skills, a Go CLI, and 4 runtime adapters (328 stars). But the project's own workbench A/B shows Δ=+0.0000 — no measurable lift at v1 difficulty. The README is 416 lines, the quickstart has 9 conditional branches, and the activation moment requires multiple sessions. The vendor-eclipse thesis is honest but unfocused across three personas.
Seven specific blockers stand between the current state and PMF.
PMF Blockers (ranked by leverage)
PMF-1: No measurable lift to point at (P0)
PRODUCT.md reports workbench A/B at Δ=+0.0000 across 12 cases. Both legs scored 12/12. A prospective user reads: the maintainer's own benchmark cannot distinguish AgentOps-on from AgentOps-off.
Fix: Ship workbench v2 with 8-12 realistic tasks where the hook layer differentiates (multi-file refactors needing prior learnings, security reviews benefiting from prevention rules, tasks where pre-mortem catches known pitfalls). Run skill-on vs skill-off A/B. Publish Δ ≥ +0.15 in README above the fold.
Files: evals/workbench/tasks/, evals/workbench/suite-workbench-behavioral-v2.json, PRODUCT.md, README.md
PMF-2: No 60-second aha moment (P0)
GOALS.md North Star: "install to first validated flow in under 5 minutes." No gate measures it. Reality: 7 install permutations, 9 quickstart branches, and the value prop ("repo remembers what worked") only pays off by session 3.
Fix: Build a 90-second ao demo that clones a scratch repo with seeded .agents/ history, runs a task that triggers a verdict the user wouldn't have produced themselves, and shows "this is what AgentOps added" in the terminal. Zero setup beyond ao + git.
Files: cli/cmd/ao/demo.go, examples/demo/, README.md
PMF-3: ICP messaging split across three personas (P1)
README pitches solo dev, orchestrator, and quality-first maintainer equally. The sovereignty thesis (cross-runtime corpus, local-first, operator-owned scheduling) is the only wedge Anthropic cannot absorb — but it's buried.
Fix: Lead README with sovereignty value prop for teams needing constrained-network discipline, audit trails, and cross-runtime portability. Move "what if Anthropic ships native X" from defensive to assertive. Secondary personas acknowledged but not equal-weighted.
Files: README.md, PRODUCT.md, docs/index.md
PMF-4: Quickstart is a decision tree, not a fast path (P1)
9 conditional branches in skills/quickstart/SKILL.md before showing "next action." State detection belongs in ao doctor, not the first thing a user runs.
Fix: Simplify /quickstart to 1 happy path + 1 fallback. Move elaborate state detection into ao doctor.
Files: skills/quickstart/SKILL.md, cli/cmd/ao/doctor.go
PMF-5: 73 skills with no discoverability hierarchy (P1)
A new user doesn't know if they need /rpi vs /crank vs /swarm vs /evolve vs /dream. The skill router's existence is itself the symptom.
Fix: Three tiers: Start Here (5-7 skills), Power (10-15), Specialist (50+). Surface tiers in README, docs/SKILLS.md, and quickstart output.
Files: skills/SKILL-TIERS.md, docs/SKILLS.md, README.md
PMF-6: Internal vocabulary crowds user-facing surfaces (P2)
CDLC, RPI, Brownian Ratchet, σρ > δ, Meadows leverage points, Knowledge OS → Olympus → Mt. Olympus — all on the front page. Signals deep thinking to insiders; unintelligible to everyone else.
Fix: Remove jargon from README and mkdocs landing. Keep in dedicated docs. Progressive disclosure, not front-page taxonomy.
Files: README.md, docs/index.md
PMF-7: Distribution is power-user channels only (P2)
Install via brew tap, raw GitHub scripts, Claude plugin marketplace. No demo video, no VS Code listing, no "try without installing" surface.
Fix: Add demo GIF/asciicast to README. Check in example verdict output. Consider VS Code marketplace listing.
Files: README.md, examples/demo-output/
Execution Waves
Wave 1 (parallel, no dependencies)
- PMF-1 — Workbench v2 (generates the Δ number)
- PMF-2 —
ao demo (generates the activation path + demo output)
- PMF-5 — Skill tiers (independent catalog restructuring)
Wave 2 (depends on Wave 1)
- PMF-3 — ICP messaging (needs Δ from PMF-1)
- PMF-4 — Quickstart simplification (demo replaces quickstart as first action)
- PMF-7 — Distribution surface (records demo output from PMF-2)
Wave 3 (depends on Wave 2)
- PMF-6 — Vocabulary cleanup (messaging rewrite sets the tone first)
Success Criteria
| Metric |
Current |
Target |
| Workbench A/B Δ |
+0.0000 |
≥ +0.15 on v2 tasks |
| README above-the-fold |
416 lines |
≤ 150 lines |
| Quickstart branches |
9 |
≤ 3 |
| Time install → first verdict |
~15-20 min (est.) |
< 5 min measured |
| Internal jargon on front page |
5+ terms |
0 |
| Skill tiers |
flat list |
3 tiers |
The Compressed Read
PMF is gated on proving lift, narrowing the ICP, and shrinking time-to-aha — in that order. The one experiment most worth running first: ship workbench v2, publish a credible Δ, and replace the 9-branch quickstart with a single 90-second ao demo that ends with a screenshot-able verdict.
Problem
AgentOps has shipped the four-layer architecture across 73 skills, a Go CLI, and 4 runtime adapters (328 stars). But the project's own workbench A/B shows Δ=+0.0000 — no measurable lift at v1 difficulty. The README is 416 lines, the quickstart has 9 conditional branches, and the activation moment requires multiple sessions. The vendor-eclipse thesis is honest but unfocused across three personas.
Seven specific blockers stand between the current state and PMF.
PMF Blockers (ranked by leverage)
PMF-1: No measurable lift to point at (P0)
PRODUCT.mdreports workbench A/B at Δ=+0.0000 across 12 cases. Both legs scored 12/12. A prospective user reads: the maintainer's own benchmark cannot distinguish AgentOps-on from AgentOps-off.Fix: Ship workbench v2 with 8-12 realistic tasks where the hook layer differentiates (multi-file refactors needing prior learnings, security reviews benefiting from prevention rules, tasks where pre-mortem catches known pitfalls). Run skill-on vs skill-off A/B. Publish Δ ≥ +0.15 in README above the fold.
Files:
evals/workbench/tasks/,evals/workbench/suite-workbench-behavioral-v2.json,PRODUCT.md,README.mdPMF-2: No 60-second aha moment (P0)
GOALS.md North Star: "install to first validated flow in under 5 minutes." No gate measures it. Reality: 7 install permutations, 9 quickstart branches, and the value prop ("repo remembers what worked") only pays off by session 3.
Fix: Build a 90-second
ao demothat clones a scratch repo with seeded.agents/history, runs a task that triggers a verdict the user wouldn't have produced themselves, and shows "this is what AgentOps added" in the terminal. Zero setup beyondao+git.Files:
cli/cmd/ao/demo.go,examples/demo/,README.mdPMF-3: ICP messaging split across three personas (P1)
README pitches solo dev, orchestrator, and quality-first maintainer equally. The sovereignty thesis (cross-runtime corpus, local-first, operator-owned scheduling) is the only wedge Anthropic cannot absorb — but it's buried.
Fix: Lead README with sovereignty value prop for teams needing constrained-network discipline, audit trails, and cross-runtime portability. Move "what if Anthropic ships native X" from defensive to assertive. Secondary personas acknowledged but not equal-weighted.
Files:
README.md,PRODUCT.md,docs/index.mdPMF-4: Quickstart is a decision tree, not a fast path (P1)
9 conditional branches in
skills/quickstart/SKILL.mdbefore showing "next action." State detection belongs inao doctor, not the first thing a user runs.Fix: Simplify
/quickstartto 1 happy path + 1 fallback. Move elaborate state detection intoao doctor.Files:
skills/quickstart/SKILL.md,cli/cmd/ao/doctor.goPMF-5: 73 skills with no discoverability hierarchy (P1)
A new user doesn't know if they need
/rpivs/crankvs/swarmvs/evolvevs/dream. The skill router's existence is itself the symptom.Fix: Three tiers: Start Here (5-7 skills), Power (10-15), Specialist (50+). Surface tiers in README, docs/SKILLS.md, and quickstart output.
Files:
skills/SKILL-TIERS.md,docs/SKILLS.md,README.mdPMF-6: Internal vocabulary crowds user-facing surfaces (P2)
CDLC, RPI, Brownian Ratchet, σρ > δ, Meadows leverage points, Knowledge OS → Olympus → Mt. Olympus — all on the front page. Signals deep thinking to insiders; unintelligible to everyone else.
Fix: Remove jargon from README and mkdocs landing. Keep in dedicated docs. Progressive disclosure, not front-page taxonomy.
Files:
README.md,docs/index.mdPMF-7: Distribution is power-user channels only (P2)
Install via brew tap, raw GitHub scripts, Claude plugin marketplace. No demo video, no VS Code listing, no "try without installing" surface.
Fix: Add demo GIF/asciicast to README. Check in example verdict output. Consider VS Code marketplace listing.
Files:
README.md,examples/demo-output/Execution Waves
Wave 1 (parallel, no dependencies)
ao demo(generates the activation path + demo output)Wave 2 (depends on Wave 1)
Wave 3 (depends on Wave 2)
Success Criteria
The Compressed Read
PMF is gated on proving lift, narrowing the ICP, and shrinking time-to-aha — in that order. The one experiment most worth running first: ship workbench v2, publish a credible Δ, and replace the 9-branch quickstart with a single 90-second
ao demothat ends with a screenshot-able verdict.