diff --git a/.github/ISSUE_TEMPLATE/pattern-proposal.md b/.github/ISSUE_TEMPLATE/pattern-proposal.md index 684be90..b1bc808 100644 --- a/.github/ISSUE_TEMPLATE/pattern-proposal.md +++ b/.github/ISSUE_TEMPLATE/pattern-proposal.md @@ -38,15 +38,15 @@ assignees: '' - [ ] I: Context Is Everything - [ ] II: Track Everything in Git - [ ] III: One Agent, One Job -- [ ] IV: Research Before You Build -- [ ] V: Validate Externally -- [ ] VI: Lock Progress Forward -- [ ] VII: Extract Learnings -- [ ] VIII: Compound Knowledge -- [ ] IX: Measure What Matters -- [ ] X: Isolate Workers +- [ ] IV: Enforce Least Privilege +- [ ] V: Research Before You Build +- [ ] VI: Isolate Workers +- [ ] VII: Validate Externally +- [ ] VIII: Lock Progress Forward +- [ ] IX: Extract Learnings +- [ ] X: Compound Knowledge - [ ] XI: Supervise Hierarchically -- [ ] XII: Harvest Failures as Wisdom +- [ ] XII: Measure Outcomes --- diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2619496..1f622be 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -28,22 +28,22 @@ For general autonomous agents, see [12-Factor Agents](https://github.com/humanla The model is not the problem. The operations are. We apply decades of DevOps and SRE methodology to how people work with AI agents: -| DevOps / SRE Principle | Our Application (v3 Factor) | +| DevOps / SRE Principle | Our Application (v4 Factor) | |------------------------|-----------------------------| | Configuration Management | **I. Context Is Everything** -- manage what enters the context window like you manage what enters production | | Infrastructure as Code | **II. Track Everything in Git** -- prompts, learnings, workflows as versioned artifacts | | Single Responsibility | **III. One Agent, One Job** -- each agent gets a scoped task and fresh context | -| Design Reviews | **IV. Research Before You Build** -- understand the problem space before generating code | -| Zero-Trust Verification | **V. Validate Externally** -- no agent grades its own work, ever | -| Continuous Delivery | **VI. Lock Progress Forward** -- validated work ratchets and cannot regress | -| Post-Mortems | **VII. Extract Learnings** -- every session produces two outputs: the work and the lessons | -| Observability & Feedback Loops | **VIII. Compound Knowledge** -- the knowledge flywheel that makes sessions smarter over time | -| SLOs & Error Budgets | **IX. Measure What Matters** -- track fitness toward goals, not activity metrics | -| Process Isolation | **X. Isolate Workers** -- each worker gets its own workspace and zero shared mutable state | +| Least Privilege / Zero-Trust Access | **IV. Enforce Least Privilege** -- each agent gets the minimum permissions its job requires, nothing more | +| Design Reviews | **V. Research Before You Build** -- understand the problem space before generating code | +| Process Isolation | **VI. Isolate Workers** -- each worker gets its own workspace and zero shared mutable state | +| Zero-Trust Verification | **VII. Validate Externally** -- no agent grades its own work, ever | +| Continuous Delivery | **VIII. Lock Progress Forward** -- validated work ratchets and cannot regress | +| Post-Mortems | **IX. Extract Learnings** -- every session produces two outputs: the work and the lessons | +| Observability & Feedback Loops; Blameless Post-Mortems | **X. Compound Knowledge** -- the knowledge flywheel that makes sessions smarter over time; failed attempts are data, not waste | | Escalation Hierarchies | **XI. Supervise Hierarchically** -- escalation flows up, never sideways | -| Blameless Post-Mortems | **XII. Harvest Failures as Wisdom** -- failed attempts are data, not waste | +| SLOs & Error Budgets | **XII. Measure Outcomes** -- track fitness toward goals, not activity metrics | -**The core insight:** Better operations make the same model perform dramatically better. Knowledge compounding (Factor VIII) is the differentiator that no amount of model improvement replaces. +**The core insight:** Better operations make the same model perform dramatically better. Knowledge compounding (Factor X) is the differentiator that no amount of model improvement replaces. ``` Ad-hoc: Prompt agent -> Hope for good output -> Repeat from scratch @@ -56,10 +56,10 @@ Contributions map to the tier they strengthen: | Tier | Factors | Focus | |------|---------|-------| -| **Foundation (I-III)** | Context, Git, Scoping | Non-negotiable basics, zero tooling required | -| **Workflow (IV-VI)** | Research, Validation, Ratcheting | The discipline that separates hoping from operating | -| **Knowledge (VII-IX)** | Extraction, Compounding, Measurement | Where sessions get measurably smarter over time | -| **Scale (X-XII)** | Isolation, Supervision, Failure Harvesting | Multi-agent orchestration (advanced, optional) | +| **Foundation (I-IV)** | Context, Git, Scoping, Least Privilege | Non-negotiable basics, zero tooling required | +| **Workflow (V-VIII)** | Research, Isolation, Validation, Ratcheting | The discipline that separates hoping from operating | +| **Knowledge (IX-X)** | Extraction, Compounding | Where sessions get measurably smarter over time | +| **Scale (XI-XII)** | Supervision, Measurement | Multi-agent orchestration and outcome tracking (advanced) | --- @@ -72,9 +72,9 @@ The best contributions add new operational patterns that practitioners can apply Patterns can address any tier. Some high-value categories: - **Context management patterns** (Factor I) -- techniques for loading, pruning, and structuring context windows -- **Workflow discipline patterns** (Factors IV-VI) -- research templates, validation checklists, ratcheting mechanisms -- **Knowledge compounding patterns** (Factor VIII) -- extraction workflows, quality gates, retrieval strategies, decay management -- **Failure harvesting patterns** (Factor XII) -- techniques for capturing and indexing what did not work +- **Workflow discipline patterns** (Factors V-VIII) -- research templates, validation checklists, ratcheting mechanisms +- **Knowledge compounding patterns** (Factor X) -- extraction workflows, quality gates, retrieval strategies, decay management +- **Failure harvesting patterns** (Factor X) -- techniques for capturing and indexing what did not work **Operational pattern checklist:** @@ -216,7 +216,7 @@ Our skills and patterns can always be sharper. Contributions that improve operat ### 5. Share Knowledge Compounding Workflows -Factor VIII (Compound Knowledge) is the hero differentiator. Contributions that demonstrate real knowledge compounding workflows are especially valuable: +Factor X (Compound Knowledge) is the hero differentiator. Contributions that demonstrate real knowledge compounding workflows are especially valuable: - How you structure `learnings.md` files - Quality gating criteria for extracted knowledge @@ -280,7 +280,7 @@ Discussion first: | Quality | Good | Bad | |---------|------|-----| -| **Actionable** | "Add this check before commit to enforce Factor V" | "Agents could be better" | +| **Actionable** | "Add this check before commit to enforce Factor VII" | "Agents could be better" | | **Grounded** | Maps to specific factor(s) and tier | Generic agent advice | | **Evidence-based** | Shows real session improvement | Theoretical claims | | **Portable** | Works without specific tooling | Requires proprietary setup | diff --git a/GOALS.yaml b/GOALS.yaml index 6c84a0d..d487c2b 100644 --- a/GOALS.yaml +++ b/GOALS.yaml @@ -8,7 +8,7 @@ goals: - id: factor-naming-consistent description: "Factor files are numbered 01-12 and filenames reflect the new factor titles" check: | - expected="01-context-is-everything 02-track-everything-in-git 03-one-agent-one-job 04-research-before-you-build 05-validate-externally 06-lock-progress-forward 07-extract-learnings 08-compound-knowledge 09-measure-what-matters 10-isolate-workers 11-supervise-hierarchically 12-harvest-failures-as-wisdom" + expected="01-context-is-everything 02-track-everything-in-git 03-one-agent-one-job 05-research-before-you-build 07-validate-externally 08-lock-progress-forward 09-extract-learnings 10-compound-knowledge 12-measure-outcomes 06-isolate-workers 11-supervise-hierarchically 10-compound-knowledge" count=0 for name in $expected; do [ -f "factors/${name}.md" ] && count=$((count+1)) diff --git a/docs/00-SUMMARY.md b/docs/00-SUMMARY.md index b45ecb8..f43f51b 100644 --- a/docs/00-SUMMARY.md +++ b/docs/00-SUMMARY.md @@ -35,7 +35,7 @@ see [12-Factor Agents](https://github.com/humanlayer/12-factor-agents) by Dex Ho ## The 12 Factors -### Foundation (I--III) -- Start Here +### Prepare (I--III) -- Start Here Non-negotiable basics. Zero tooling required. Get these wrong and nothing else matters. @@ -45,34 +45,29 @@ Non-negotiable basics. Zero tooling required. Get these wrong and nothing else m | **[II](../factors/02-track-everything-in-git.md)** | **Track Everything in Git** | If it is not in git, it did not happen -- learnings, decisions, and knowledge live alongside code. | | **[III](../factors/03-one-agent-one-job.md)** | **One Agent, One Job** | Each agent gets a scoped task and fresh context; never reuse a saturated window. | -### Flow (IV--VI) -- The Discipline +### Bound (IV--VI) -- The Discipline -How work flows through agents. The discipline that separates "prompting and hoping" +How work is bounded before agents run. The discipline that separates "prompting and hoping" from a reliable operating model. | # | Factor | One-Line Rule | |---|--------|---------------| -| **[IV](../factors/04-research-before-you-build.md)** | **Research Before You Build** | Understand the problem space before generating a single line of code. | -| **[V](../factors/05-validate-externally.md)** | **Validate Externally** | The worker reports evidence; an independent checker writes the binding verdict. No agent grades its own work. | -| **[VI](../factors/06-lock-progress-forward.md)** | **Lock Progress Forward** | Once work passes validation, it ratchets forward and cannot regress. | +| **[IV](../factors/04-enforce-least-privilege.md)** | **Enforce Least Privilege** | An agent acts inside a least-privilege envelope it cannot widen -- not even on untrusted input. | +| **[V](../factors/05-research-before-you-build.md)** | **Research Before You Build** | Understand the problem space before generating a single line of code. | +| **[VI](../factors/06-isolate-workers.md)** | **Isolate Workers** | Each worker gets its own workspace, context, and zero shared mutable state. | -### Knowledge (VII--IX) -- Where Compounding Kicks In +### Select (VII--IX) -- Where Compounding Kicks In -Systematic extraction and injection of knowledge. This is where sessions start -getting measurably smarter over time. +Validate, lock, and extract -- selecting the work that holds. This is where sessions +start getting measurably smarter over time. | # | Factor | One-Line Rule | |---|--------|---------------| -| **[VII](../factors/07-extract-learnings.md)** | **Extract Learnings** | Every session produces two outputs: the work product and the lessons learned. | -| **[VIII](../factors/08-compound-knowledge.md)** | **Compound Knowledge** | Learnings flow back into future sessions automatically -- extract, gate, inject, measure, decay. | -| **[IX](../factors/09-measure-what-matters.md)** | **Measure What Matters** | Track fitness toward goals, not activity metrics. | +| **[VII](../factors/07-validate-externally.md)** | **Validate Externally** | The worker reports evidence; an independent checker writes the binding verdict. No agent grades its own work. | +| **[VIII](../factors/08-lock-progress-forward.md)** | **Lock Progress Forward** | Once work passes validation, it ratchets forward and cannot regress. | +| **[IX](../factors/09-extract-learnings.md)** | **Extract Learnings** | Every session produces two outputs: the work product and the lessons learned. | -> **Factor VIII is the hero.** It implements the knowledge flywheel -- the -> compounding loop that cannot be commoditized. Better models with amnesia still -> repeat your mistakes. Knowledge compounding is the one capability no amount of -> model improvement replaces. - -### Scale (X--XII) -- The Factory Altitude +### Govern (X--XII) -- The Factory Altitude The same factors at fleet scale. Working solo, you live them in miniature -- a git worktree is isolation, your own judgment is supervision, your `learnings.md` @@ -81,9 +76,14 @@ factors. | # | Factor | One-Line Rule | |---|--------|---------------| -| **[X](../factors/10-isolate-workers.md)** | **Isolate Workers** | Each worker gets its own workspace, context, and zero shared mutable state. | +| **[X](../factors/10-compound-knowledge.md)** | **Compound Knowledge** | Learnings flow back into future sessions automatically -- extract, gate, inject, measure, decay; dead ends become routing hints that prune the next agent's search. | | **[XI](../factors/11-supervise-hierarchically.md)** | **Supervise Hierarchically** | Escalation flows up, never sideways -- one coordinator dispatches, workers execute. | -| **[XII](../factors/12-harvest-failures-as-wisdom.md)** | **Harvest Failures as Wisdom** | Turn dead ends into routing hints that prune the next agent's search. | +| **[XII](../factors/12-measure-outcomes.md)** | **Measure Outcomes** | Track fitness toward goals, not activity metrics. | + +> **Factor X is the hero.** It implements the knowledge flywheel -- the +> compounding loop that cannot be commoditized. Better models with amnesia still +> repeat your mistakes. Knowledge compounding is the one capability no amount of +> model improvement replaces. --- @@ -94,18 +94,18 @@ tier and keep the value. ``` Quickstart (5 min) --> learnings.md file, zero tooling -Foundation (I-III) --> Context discipline, git tracking, fresh sessions -Flow (IV-VI) --> Research, external validation, ratcheting -Knowledge (VII-IX) --> Extraction, compounding, measurement -Scale (X-XII) --> Multi-agent isolation, supervision, failure harvesting +Prepare (I-III) --> Context discipline, git tracking, fresh sessions +Bound (IV-VI) --> Least privilege, research, worker isolation +Select (VII-IX) --> External validation, ratcheting, extraction +Govern (X-XII) --> Knowledge compounding, supervision, measurement ``` | Transition | Trigger | |------------|---------| -| Quickstart to Foundation | learnings.md gets unwieldy or context problems recur | -| Foundation to Flow | You keep re-explaining codebase patterns to new sessions | -| Flow to Knowledge | The same mistakes recur across sessions despite research | -| Knowledge to Scale | Multiple parallel agents cause conflicts | +| Quickstart to Prepare | learnings.md gets unwieldy or context problems recur | +| Prepare to Bound | You keep re-explaining codebase patterns to new sessions | +| Bound to Select | The same mistakes recur across sessions despite research | +| Select to Govern | Multiple parallel agents cause conflicts | --- @@ -113,14 +113,15 @@ Scale (X-XII) --> Multi-agent isolation, supervision, failure harvestin | Pain Point | Start With | |------------|------------| -| Agent claims tests pass but code is broken | Factor V: Validate Externally | +| Agent claims tests pass but code is broken | Factor VII: Validate Externally | | Context problems, instruction loss | Factor I: Context Is Everything | | Scope creep, tangled implementations | Factor III: One Agent, One Job | -| Same mistakes repeated across sessions | Factor VIII: Compound Knowledge | -| No understanding before implementation | Factor IV: Research Before You Build | -| Cannot resume work across sessions | Factor VI: Lock Progress Forward | -| No visibility into what is working | Factor IX: Measure What Matters | -| Multi-agent workspace conflicts | Factor X: Isolate Workers | +| Same mistakes repeated across sessions | Factor X: Compound Knowledge | +| No understanding before implementation | Factor V: Research Before You Build | +| Agent over-reaches its permissions | Factor IV: Enforce Least Privilege | +| Cannot resume work across sessions | Factor VIII: Lock Progress Forward | +| No visibility into what is working | Factor XII: Measure Outcomes | +| Multi-agent workspace conflicts | Factor VI: Isolate Workers | --- diff --git a/docs/README.md b/docs/README.md index d6d93e2..7b1184d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -58,11 +58,11 @@ BUILD → WORK → RUN **Problem-solving recipes for specific tasks** - [How-To Index](./how-to/) - All guides organized by task - - [Implement Validation Gates](./how-to/README.md#validation--quality) - Factor IV - - [Prevent Context Collapse](./how-to/README.md#context-management) - Factor II - - [Build Pattern Library](./how-to/README.md#pattern-libraries) - Factor XII - - [Track Success Rates](./how-to/README.md#measurement--observability) - Factor V - - [Lock Progress Forward](./how-to/README.md#session-management) - Factor VI + - [Implement Validation Gates](./how-to/README.md#validation--quality) - Factor VII + - [Prevent Context Collapse](./how-to/README.md#context-management) - Factor I + - [Build Pattern Library](./how-to/README.md#pattern-libraries) - Factor X + - [Track Success Rates](./how-to/README.md#measurement--observability) - Factor XII + - [Lock Progress Forward](./how-to/README.md#session-management) - Factor VIII - [And 20+ more...](./how-to/) --- @@ -131,10 +131,12 @@ Looking for documentation about a specific factor? | **I: Context Is Everything** | [Quick Start](./getting-started/quick-start.md#step-1) | Git Workflow | [Factor I](../factors/01-context-is-everything.md) | [Knowledge OS](./principles/knowledge-os.md) | | **II: Track Everything in Git** | [Quick Start](./getting-started/quick-start.md#add-factor-ii) | [Prevent Collapse](./how-to/README.md#context-management) | [Factor II](../factors/02-track-everything-in-git.md) | [Context Engineering](./principles/context-engineering.md) | | **III: One Agent, One Job** | [Solo Dev](./getting-started/solo-developer.md#factor-iii) | Multi-Phase | [Factor III](../factors/03-one-agent-one-job.md) | [Learning Science](./principles/five-pillars.md) | -| **IV: Research Before You Build** | [Quick Start](./getting-started/quick-start.md#step-2) | [Validation Gates](./how-to/README.md#validation--quality) | [Factor IV](../factors/04-research-before-you-build.md) | [DevOps/SRE](./principles/five-pillars.md) | -| **V: Validate Externally** | [Solo Dev](./getting-started/solo-developer.md#factor-v) | [Track Success](./how-to/README.md#measurement--observability) | [Factor V](../factors/05-validate-externally.md) | Metrics | -| **VI: Lock Progress Forward** | [Solo Dev](./getting-started/solo-developer.md#factor-vi) | [Session Notes](./how-to/README.md#session-management) | [Factor VI](../factors/06-lock-progress-forward.md) | [Context Engineering](./principles/context-engineering.md) | -| **VII-XII** | [Flow Guide](./tutorials/workflow-guide.md) | [How-To](./how-to/) | [Factors](../factors/) | [Pillars](./principles/five-pillars.md) | +| **IV: Enforce Least Privilege** | [Solo Dev](./getting-started/solo-developer.md) | [Validation Gates](./how-to/README.md#validation--quality) | [Factor IV](../factors/04-enforce-least-privilege.md) | [DevOps/SRE](./principles/five-pillars.md) | +| **V: Research Before You Build** | [Quick Start](./getting-started/quick-start.md#step-2) | [Validation Gates](./how-to/README.md#validation--quality) | [Factor V](../factors/05-research-before-you-build.md) | [DevOps/SRE](./principles/five-pillars.md) | +| **VI: Isolate Workers** | [Flow Guide](./tutorials/workflow-guide.md) | [How-To](./how-to/) | [Factor VI](../factors/06-isolate-workers.md) | [Pillars](./principles/five-pillars.md) | +| **VII: Validate Externally** | [Solo Dev](./getting-started/solo-developer.md#factor-v) | [Track Success](./how-to/README.md#measurement--observability) | [Factor VII](../factors/07-validate-externally.md) | Metrics | +| **VIII: Lock Progress Forward** | [Solo Dev](./getting-started/solo-developer.md#factor-vi) | [Session Notes](./how-to/README.md#session-management) | [Factor VIII](../factors/08-lock-progress-forward.md) | [Context Engineering](./principles/context-engineering.md) | +| **IX-XII** | [Flow Guide](./tutorials/workflow-guide.md) | [How-To](./how-to/) | [Factors](../factors/) | [Pillars](./principles/five-pillars.md) | --- @@ -144,11 +146,11 @@ Looking to achieve a specific FAAFO goal? | FAAFO Goal | Start Here | Relevant Factors | |------------|------------|------------------| -| **Fast** | [Validation Gates](./how-to/README.md#validation--quality) | II, IV, V | -| **Ambitious** | [Pattern Library](./how-to/README.md#pattern-libraries) | IX, XII | -| **Autonomous** | [Solo Developer Guide](./getting-started/solo-developer.md) | I, III, IV, VII, VIII | -| **Fun** | [Quick Start](./getting-started/quick-start.md) | IV, XI | -| **Optionality** | [Context Management](./how-to/README.md#context-management) | II, VI | +| **Fast** | [Validation Gates](./how-to/README.md#validation--quality) | II, V, VII | +| **Ambitious** | [Pattern Library](./how-to/README.md#pattern-libraries) | IX, X | +| **Autonomous** | [Solo Developer Guide](./getting-started/solo-developer.md) | I, III, V, IX, X | +| **Fun** | [Quick Start](./getting-started/quick-start.md) | V, XI | +| **Optionality** | [Context Management](./how-to/README.md#context-management) | II, VIII | --- diff --git a/docs/assets/12-factor-animated.svg b/docs/assets/12-factor-animated.svg index 6f25123..a9aeed0 100644 --- a/docs/assets/12-factor-animated.svg +++ b/docs/assets/12-factor-animated.svg @@ -254,7 +254,7 @@ IX - Measure What Matters + Measure Outcomes Search history for what works. Extract patterns from successful runs. @@ -281,7 +281,7 @@ XII - Harvest Failures + Compound Knowledge Bundle what works into reusable components. Share across projects and teams. diff --git a/docs/assets/12-factor-landscape-animated.svg b/docs/assets/12-factor-landscape-animated.svg index 55df79d..2ead619 100644 --- a/docs/assets/12-factor-landscape-animated.svg +++ b/docs/assets/12-factor-landscape-animated.svg @@ -514,7 +514,7 @@ - IX. Measure What Matters + IX. Measure Outcomes Track metrics that drive quality @@ -568,7 +568,7 @@ - XII. Harvest Failures as Wisdom + XII. Compound Knowledge as Wisdom Every failure teaches something diff --git a/docs/assets/12-factor-landscape.svg b/docs/assets/12-factor-landscape.svg index 08013ee..912d078 100644 --- a/docs/assets/12-factor-landscape.svg +++ b/docs/assets/12-factor-landscape.svg @@ -259,7 +259,7 @@ - IX. Measure What Matters + IX. Measure Outcomes Track metrics that drive quality @@ -313,7 +313,7 @@ - XII. Harvest Failures as Wisdom + XII. Compound Knowledge as Wisdom Every failure teaches something diff --git a/docs/assets/carousel/slide-04-framework.svg b/docs/assets/carousel/slide-04-framework.svg index 304e388..57028d4 100644 --- a/docs/assets/carousel/slide-04-framework.svg +++ b/docs/assets/carousel/slide-04-framework.svg @@ -106,7 +106,7 @@ Compound Knowledge Measure What Matters + font-size="13" fill="#94A3B8">Measure Outcomes @@ -123,7 +123,7 @@ Supervise Hierarchy Harvest Failures + font-size="13" fill="#94A3B8">Compound Knowledge diff --git a/docs/assets/carousel/slide-07-improvement.svg b/docs/assets/carousel/slide-07-improvement.svg index ec98b79..3f17332 100644 --- a/docs/assets/carousel/slide-07-improvement.svg +++ b/docs/assets/carousel/slide-07-improvement.svg @@ -133,7 +133,7 @@ XII Harvest Failures + font-size="16" font-weight="700" fill="#FAFAFA">Compound Knowledge as Wisdom tools | Factor III: One Agent, One Job (clear roles and methodology) | -| Juniors benefit most | Factor VII: Extract Learnings (accelerated learning) | -| Happy time increase | Factor V: Validate Externally (fewer escaped bugs) | +| Juniors benefit most | Factor IX: Extract Learnings (accelerated learning) | +| Happy time increase | Factor VII: Validate Externally (fewer escaped bugs) | --- @@ -82,8 +82,8 @@ These case studies represent **industry validation** of patterns we independentl | Finding | Related Factor | |---------|---------------| -| 70% smaller MRs | Factor VI: Lock Progress Forward (incremental commits) | -| Quality + speed | Factor V: Validate Externally (tests, linters, CI) | +| 70% smaller MRs | Factor VIII: Lock Progress Forward (incremental commits) | +| Quality + speed | Factor VII: Validate Externally (tests, linters, CI) | | Review efficiency | Factor III: One Agent, One Job (focused scope) | --- @@ -120,9 +120,9 @@ These case studies represent **industry validation** of patterns we independentl | Finding | Related Factor | |---------|---------------| -| 100x compression | Factor IV: Research Before You Build (front-loaded clarity) | +| 100x compression | Factor V: Research Before You Build (front-loaded clarity) | | Non-dev prototyping | Factor I: Context Is Everything (clear problem framing) | -| Rapid validation | Factor V: Validate Externally (fast feedback loops) | +| Rapid validation | Factor VII: Validate Externally (fast feedback loops) | --- @@ -159,9 +159,9 @@ These case studies represent **industry validation** of patterns we independentl | Finding | Related Factor | |---------|---------------| -| Start narrow | Factor VI: Lock Progress Forward (incremental expansion) | +| Start narrow | Factor VIII: Lock Progress Forward (incremental expansion) | | Augmentation model | Factor III: One Agent, One Job (clear division of labor) | -| Measurable FTE savings | Factor IX: Measure What Matters (concrete metrics) | +| Measurable FTE savings | Factor XII: Measure Outcomes (concrete metrics) | --- @@ -198,8 +198,8 @@ These case studies represent **industry validation** of patterns we independentl | Finding | Related Factor | |---------|---------------| -| Pattern matching | Factor VIII: Compound Knowledge (reusable solutions) | -| Velocity compounding | Factor VII: Extract Learnings (each fix teaches the next) | +| Pattern matching | Factor X: Compound Knowledge (reusable solutions) | +| Velocity compounding | Factor IX: Extract Learnings (each fix teaches the next) | | Human + AI collaboration | Factor XI: Supervise Hierarchically (human oversight) | --- @@ -239,8 +239,8 @@ These case studies represent **industry validation** of patterns we independentl | Finding | Related Factor | |---------|---------------| | Discipline investment | All 12 Factors (systematic operational discipline) | -| AI-native workflows | Factor VIII: Compound Knowledge (built-in learning loops) | -| Training importance | Factor VII: Extract Learnings (institutional knowledge) | +| AI-native workflows | Factor X: Compound Knowledge (built-in learning loops) | +| Training importance | Factor IX: Extract Learnings (institutional knowledge) | --- @@ -282,17 +282,17 @@ No Validation 3. **Incremental progress compounds** - Smaller changes = faster review = higher throughput - - Aligns with Factor VI: Lock Progress Forward + - Aligns with Factor VIII: Lock Progress Forward 4. **Measurement enables improvement** - Successful cases measured specific metrics - Clear ROI enabled expansion - - Aligns with Factor IX: Measure What Matters + - Aligns with Factor XII: Measure Outcomes 5. **External validation maintains quality** - Fast + sloppy is not the pattern - Fast + externally validated = enterprise-grade - - Aligns with Factor V: Validate Externally + - Aligns with Factor VII: Validate Externally --- diff --git a/docs/ecosystem.md b/docs/ecosystem.md index 784027f..26a5b29 100644 --- a/docs/ecosystem.md +++ b/docs/ecosystem.md @@ -38,15 +38,15 @@ BUILD → WORK → RUN │ │ • Disconnected, zero-tolerance, constrained environments │ │ │ │ • Monitoring, validation, incident response, reliability │ │ │ │ │ │ -│ │ Foundation (I-III): Workflow (IV-VI): │ │ -│ │ I. Context Is Everything IV. Research Before You Build │ │ -│ │ II. Track Everything V. Validate Externally │ │ -│ │ III. One Agent, One Job VI. Lock Progress Forward │ │ +│ │ Prepare (I-IV): Bound (V-VIII): │ │ +│ │ I. Context Is Everything V. Research Before You Build │ │ +│ │ II. Track Everything VI. Isolate Workers │ │ +│ │ III. One Agent, One Job VII. Validate Externally │ │ +│ │ IV. Enforce Least Priv. VIII. Lock Progress Forward │ │ │ │ │ │ -│ │ Knowledge (VII-IX): Scale (X-XII): │ │ -│ │ VII. Extract Learnings X. Isolate Workers │ │ -│ │ VIII.Compound Knowledge XI. Supervise Hierarchically │ │ -│ │ IX. Measure What Matters XII. Harvest Failures as Wisdom │ │ +│ │ Select (IX-X): Govern (XI-XII): │ │ +│ │ IX. Extract Learnings XI. Supervise Hierarchically │ │ +│ │ X. Compound Knowledge XII. Measure Outcomes │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ ▲ │ │ │ Agents deployed here │ @@ -186,9 +186,9 @@ That's why 12-Factor AgentOps applies 20 years of DevOps/SRE wisdom to AI agents | Memory tattoos | I. Context Is Everything | Pattern 7: Memory Decay | | 40% rule | II. Track Everything in Git | Pattern 2: Context Amnesia | | Head chef / sous chefs | III. One Agent, One Job | Pattern 3: Instruction Drift | -| Prevent-Detect-Correct | V. Validate Externally | Pattern 1: Tests Passing Lie | -| FAAFO metrics | IX. Measure What Matters | Pattern 11: Process Gridlock | -| Session continuity | VI. Lock Progress Forward | Pattern 7: Memory Decay | +| Prevent-Detect-Correct | VII. Validate Externally | Pattern 1: Tests Passing Lie | +| FAAFO metrics | XII. Measure Outcomes | Pattern 11: Process Gridlock | +| Session continuity | VIII. Lock Progress Forward | Pattern 7: Memory Decay | ### 12-Factor Agents → AgentOps @@ -197,7 +197,7 @@ That's why 12-Factor AgentOps applies 20 years of DevOps/SRE wisdom to AI agents | Own your prompts | I. Context Is Everything | Context includes prompt evolution | | Own your context window | II. Track Everything in Git | Operational enforcement of context limits | | Small, focused agents | III. One Agent, One Job | Same principle, operational focus | -| Launch/Pause/Resume | VI. Lock Progress Forward | Operational implementation of state persistence | +| Launch/Pause/Resume | VIII. Lock Progress Forward | Operational implementation of state persistence | --- @@ -210,7 +210,7 @@ Start with [Factor Mapping](./explanation/vibe-coding-integration.md) to see how Start with [Evolution of 12-Factor](./principles/evolution-of-12-factor.md) to see how AgentOps extends building patterns to operations. ### If you're starting fresh -Start with [Getting Started](./getting-started/) and work through Foundation factors (I-III). +Start with [Getting Started](./getting-started/) and work through the Prepare factors (I-IV). --- diff --git a/docs/explanation/ai-summit-validation-2025.md b/docs/explanation/ai-summit-validation-2025.md index 93db3aa..dbf9415 100644 --- a/docs/explanation/ai-summit-validation-2025.md +++ b/docs/explanation/ai-summit-validation-2025.md @@ -174,7 +174,7 @@ --- -### 6. Human Validation Checkpoints (now Factor V: Validate Externally) +### 6. Human Validation Checkpoints (now Factor VII: Validate Externally) #### Our Discovery (2024) @@ -184,7 +184,7 @@ - `/prime` routes to right specialist, human confirms **Evidence:** -- Factor V: Validate Externally (with human oversight) +- Factor VII: Validate Externally (with human oversight) - Pre-commit hooks require approval - No autonomous deployment (yet) diff --git a/docs/explanation/beads-workflow-integration.md b/docs/explanation/beads-workflow-integration.md index a55c921..54e55a2 100644 --- a/docs/explanation/beads-workflow-integration.md +++ b/docs/explanation/beads-workflow-integration.md @@ -140,9 +140,9 @@ The workflow creates persistent memory across sessions: | **I. Context Is Everything** | Git-backed `.beads/issues.jsonl` | | **II. Track Everything in Git** | JIT loading via `bd show` | | **III. One Agent, One Job** | One issue per `/implement` | -| **IV. Research Before You Build** | Status tracking (`open` → `in_progress` → `closed`) | -| **V. Validate Externally** | Issue lifecycle metrics | -| **VI. Lock Progress Forward** | `bd ready` picks up where you left off | +| **V. Research Before You Build** | Status tracking (`open` → `in_progress` → `closed`) | +| **VII. Validate Externally** | Issue lifecycle metrics | +| **VIII. Lock Progress Forward** | `bd ready` picks up where you left off | ## Attribution diff --git a/docs/explanation/ecosystem-position.md b/docs/explanation/ecosystem-position.md index 2836948..a6557d0 100644 --- a/docs/explanation/ecosystem-position.md +++ b/docs/explanation/ecosystem-position.md @@ -140,8 +140,8 @@ Each AgentOps factor prevents specific failure patterns: |-----------------|-------------------------| | I. Context Is Everything | 7. Memory Tattoo Decay | | II. Track Everything in Git | 2. Context Amnesia | -| IV. Research Before You Build | 1. Tests Passing Lie | -| VIII. Compound Knowledge | 9. Bridge Torching | +| V. Research Before You Build | 1. Tests Passing Lie | +| X. Compound Knowledge | 9. Bridge Torching | ### 12-Factor Agents Provides Foundation HumanLayer's factors ensure agents are built in a way that CAN be operated: diff --git a/docs/explanation/faafo-north-star.md b/docs/explanation/faafo-north-star.md index 7e68c81..4590101 100644 --- a/docs/explanation/faafo-north-star.md +++ b/docs/explanation/faafo-north-star.md @@ -78,7 +78,7 @@ FAAFO describes the **ideal state of developer experience** when AI augments wor "Proven patterns to achieve FAAFO" ↓ 12-Factor AgentOps (The WHAT - Operational Patterns) - I-IV: Foundation → V-VIII: Operations → IX-XII: Improvement + I-IV: Prepare → V-VIII: Bound → IX-X: Select → XI-XII: Govern "Specific practices implementing pillars" ↓ AI Workflows (The IMPLEMENTATION) @@ -100,7 +100,7 @@ FAAFO describes the **ideal state of developer experience** when AI augments wor **Technical pillar applied:** Context Engineering (40% rule) -**12-Factor implementation:** Factor II (Track Everything in Git), Factor VI (Lock Progress Forward) +**12-Factor implementation:** Factor II (Track Everything in Git), Factor VIII (Lock Progress Forward) **Workflow implementation:** JIT loading, thin kernels, bundle compression (12:1 ratio) @@ -110,7 +110,7 @@ FAAFO describes the **ideal state of developer experience** when AI augments wor --- -### Example 2: Validation Gates (Factor IV) +### Example 2: Validation Gates (Factor VII) **FAAFO dimensions:** Fast + Fun - Fast = Catch errors early (don't wait for production) @@ -118,7 +118,7 @@ FAAFO describes the **ideal state of developer experience** when AI augments wor **Technical pillar applied:** DevOps/SRE (CI/CD, validation gates) -**12-Factor implementation:** Factor IV (Research Before You Build), Factor V (Validate Externally) +**12-Factor implementation:** Factor V (Research Before You Build), Factor VII (Validate Externally) **Workflow implementation:** Multi-layer validation (syntax → schema → security → policy) @@ -128,7 +128,7 @@ FAAFO describes the **ideal state of developer experience** when AI augments wor --- -### Example 3: Pattern Libraries (Factor XII) +### Example 3: Pattern Libraries (Factor X) **FAAFO dimensions:** Ambitious + Autonomous - Ambitious = Tackle projects previously out of reach @@ -136,7 +136,7 @@ FAAFO describes the **ideal state of developer experience** when AI augments wor **Technical pillar applied:** Learning Science (pattern reuse, spaced repetition) -**12-Factor implementation:** Factor IX (Measure What Matters), Factor XII (Harvest Failures as Wisdom) +**12-Factor implementation:** Factor XII (Measure Outcomes), Factor X (Compound Knowledge) **Workflow implementation:** Bundle system, golden patterns, 84 specialized agents diff --git a/docs/explanation/from-theory-to-production.md b/docs/explanation/from-theory-to-production.md index d953829..d4ce8ba 100644 --- a/docs/explanation/from-theory-to-production.md +++ b/docs/explanation/from-theory-to-production.md @@ -29,7 +29,7 @@ The clearest compression of that evolution is now straightforward: a **stateful **Artifacts:** - The 12 Factors (Factors I-XII) -- Four Tiers (Foundation, Workflow, Knowledge, Scale) +- Four Phases (Prepare, Bound, Select, Govern) - Core principles (extract learnings, improve system, document context, validate externally, compound knowledge) **Audience:** Anyone using AI agents (developers, writers, researchers, teams) @@ -157,8 +157,8 @@ The Intelligence Community (IC) represents the most constrained deployment envir **Solution:** Initial 12 Factors - Factor I: Context Is Everything (40% rule, JIT loading) - Factor II: Track Everything in Git (Git as memory) -- Factor VII: Extract Learnings (extract from history) -- Factor X: Isolate Workers (focused, independent execution) +- Factor IX: Extract Learnings (extract from history) +- Factor VI: Isolate Workers (focused, independent execution) **Validation:** 40x speedups on complex workflows, 0% context collapse @@ -183,7 +183,7 @@ The Intelligence Community (IC) represents the most constrained deployment envir - **SSE telemetry** (Houston): One-way observability, simpler than WebSocket **Integration into 12-Factor:** -- These patterns enhance Factors III, VI, VIII, IX, XI (see Implementation Patterns sections) +- These patterns enhance Factors III, VIII, X, XII, XI (see Implementation Patterns sections) - Documented in `docs/explanation/pattern-heritage.md` --- @@ -208,7 +208,7 @@ The Intelligence Community (IC) represents the most constrained deployment envir - **Multi-tenancy via namespaces:** Team-per-namespace isolation **Integration into 12-Factor:** -- Factor XII: Harvest Failures as Wisdom now includes IC deployment profiles +- Factor X: Compound Knowledge now includes IC deployment profiles - Proves framework works under maximum constraints --- @@ -257,7 +257,7 @@ What they validate most strongly is not one magic orchestrator. They validate en - **PID-based crash recovery:** Detect failures without heartbeats - **Feature seeder pipeline:** PLAN → COMMIT → EXECUTE with human gates -**Informed factors:** III (One Agent, One Job), IX (Measure What Matters), VI (Lock Progress Forward), VIII (Compound Knowledge) +**Informed factors:** III (One Agent, One Job), XII (Measure Outcomes), VIII (Lock Progress Forward), X (Compound Knowledge) --- @@ -273,7 +273,7 @@ What they validate most strongly is not one magic orchestrator. They validate en - **ToolCall audit trail:** Every action is a CRD (auditable, approvable, reversible) - **SharedInformer caching:** Local read cache with watch for updates -**Informed factors:** IV (Research Before You Build), VIII (Compound Knowledge), IX (Measure What Matters), XII (Harvest Failures as Wisdom) +**Informed factors:** V (Research Before You Build), X (Compound Knowledge), XII (Measure Outcomes) --- @@ -281,7 +281,7 @@ What they validate most strongly is not one magic orchestrator. They validate en Each factor maps to concrete implementation patterns from Houston, Fractal, and ai-platform. -### Foundation Tier (I-III): Build Reliability from Ground Up +### Prepare Phase (I-IV): Build Reliability from Ground Up **Factor I: Context Is Everything** - **Philosophy:** 40% rule, JIT loading @@ -298,64 +298,64 @@ Each factor maps to concrete implementation patterns from Houston, Fractal, and - **Production:** KAgent CRD definitions, event-driven activation (webhook > orchestrator) - **IC deployment:** Namespace-scoped agents for classification boundaries +**Factor IV: Enforce Least Privilege** +- **Philosophy:** Grant each agent only the access its job requires +- **Production:** Scoped RBAC per KAgent, default-deny tool permissions, fail-closed credentials +- **IC deployment:** Classification-aware access boundaries, no standing broad grants + --- -### Workflow Tier (IV-VI): Disciplined Execution +### Bound Phase (V-VIII): Disciplined Execution -**Factor IV: Research Before You Build** +**Factor V: Research Before You Build** - **Philosophy:** Understand before implementing - **Production:** Reconciliation loops (Fractal), informed decision-making - **IC deployment:** Policy enforcement via admission controllers -**Factor V: Validate Externally** +**Factor VI: Isolate Workers** +- **Philosophy:** Independent, focused execution +- **Production:** Feature seeder pipeline (Houston), beads issue tracking +- **IC deployment:** Offline improvement backlog + +**Factor VII: Validate Externally** - **Philosophy:** Validation gates, independent verification, governed selection pressure - **Production:** SSE telemetry (Houston), Langfuse traces, Prometheus metrics - **IC deployment:** Air-gapped validation pipelines -**Factor VI: Lock Progress Forward** +**Factor VIII: Lock Progress Forward** - **Philosophy:** Context bundles for multi-day work, checkpointing - **Production:** Neo4j state machines, explicit memory architecture (RAG/Graph/Historical) - **IC deployment:** Stateless agents + external PostgreSQL/Neo4j --- -### Knowledge Tier (VII-IX): Continuous Learning +### Select Phase (IX-X): Continuous Learning -**Factor VII: Extract Learnings** +**Factor IX: Extract Learnings** - **Philosophy:** Extract from history - **Production:** Houston/Fractal patterns codified into architecture with provenance and replay value - **IC deployment:** Pattern libraries for air-gapped environments -**Factor VIII: Compound Knowledge (HERO)** -- **Philosophy:** Knowledge compounds over time -- **Production:** BudgetQuota enforcement (Fractal), 3-phase pipeline (Houston) -- **IC deployment:** Hard limits on token/cost budgets - -**Factor IX: Measure What Matters** -- **Philosophy:** Make the fitness gradient visible -- **Production:** Effective output metrics, quality ratios, cost tracking -- **IC deployment:** Air-gapped Grafana dashboards +**Factor X: Compound Knowledge (HERO)** +- **Philosophy:** Knowledge compounds over time; every failure becomes durable wisdom +- **Production:** BudgetQuota enforcement (Fractal), 3-phase pipeline (Houston), 3-tier IC deployment model (Edge/Datacenter/Frontier) +- **IC deployment:** Hard limits on token/cost budgets, air-gap playbook, blameless postmortems --- -### Scale Tier (X-XII): The Factory Altitude - -The same three factors at fleet scale. Working solo you live them in miniature — a git worktree is isolation, your own judgment is supervision, your `learnings.md` is failure harvesting. Running parallel agents on complex projects, the same rules need real machinery. You grow into this altitude; you don't skip the factors. +### Govern Phase (XI-XII): The Factory Altitude -**Factor X: Isolate Workers** -- **Philosophy:** Independent, focused execution -- **Production:** Feature seeder pipeline (Houston), beads issue tracking -- **IC deployment:** Offline improvement backlog +You live these in miniature working solo — your own judgment is supervision, your `learnings.md` is the fitness gradient. Running parallel agents on complex projects, the same rules need real machinery. You grow into this altitude; you don't skip the factors. **Factor XI: Supervise Hierarchically** - **Philosophy:** Guardrails and oversight - **Production:** Reconciliation loops, fail-closed defaults, ToolCall audit (Fractal) - **IC deployment:** Constitutional enforcement of security policies -**Factor XII: Harvest Failures as Wisdom** -- **Philosophy:** Learn from every failure -- **Production:** 3-tier IC deployment model (Edge/Datacenter/Frontier) -- **IC deployment:** Air-gap playbook, blameless postmortems +**Factor XII: Measure Outcomes** +- **Philosophy:** Make the fitness gradient visible +- **Production:** Effective output metrics, quality ratios, cost tracking +- **IC deployment:** Air-gapped Grafana dashboards --- @@ -363,10 +363,11 @@ The same three factors at fleet scale. Working solo you live them in miniature ### For Individual Practitioners -**Start with:** Factors I-III (Foundation) +**Start with:** Factors I-IV (Prepare) 1. Factor I: Context Is Everything — Implement 40% rule, use JIT loading 2. Factor II: Track Everything in Git — Decisions persist across sessions 3. Factor III: One Agent, One Job — Break work into focused sessions +4. Factor IV: Enforce Least Privilege — Grant each agent only the access its job needs **Expected outcome:** Context collapse eliminated, decisions persist, productivity 2-8x @@ -374,13 +375,13 @@ The same three factors at fleet scale. Working solo you live them in miniature ### For Teams -**Add:** Factors IV-IX (Workflow + Knowledge) -4. Factor IV: Research Before You Build — Understand before implementing -5. Factor V: Validate Externally — Independent verification gates -6. Factor VI: Lock Progress Forward — Checkpoint and resume multi-day work -7. Factor VII: Extract Learnings — Capture patterns from every session -8. Factor VIII: Compound Knowledge — Build institutional memory -9. Factor IX: Measure What Matters — Track effective output, not vanity metrics +**Add:** Factors V-X (Bound + Select) +5. Factor V: Research Before You Build — Understand before implementing +6. Factor VI: Isolate Workers — Independent execution environments +7. Factor VII: Validate Externally — Independent verification gates +8. Factor VIII: Lock Progress Forward — Checkpoint and resume multi-day work +9. Factor IX: Extract Learnings — Capture patterns from every session +10. Factor X: Compound Knowledge — Build institutional memory; learn from every failure **Expected outcome:** Team coordination improves, quality gates prevent breakage, 8-20x productivity @@ -390,10 +391,9 @@ This is also where provenance and fitness start to matter operationally: teams n ### For Platform Engineers -**Add:** Factors X-XII (Scale) -10. Factor X: Isolate Workers — Independent execution environments +**Add:** Factors XI-XII (Govern) 11. Factor XI: Supervise Hierarchically — Oversight and guardrails -12. Factor XII: Harvest Failures as Wisdom — Learn from every failure +12. Factor XII: Measure Outcomes — Track effective output, not vanity metrics **Expected outcome:** Patterns compound across teams, reliability 95%+, scales to enterprise @@ -463,8 +463,8 @@ The tighter your constraints, the more valuable the patterns. But even with zero **A:** Yes. These are LLM-agnostic operational principles: - The 40% rule (Factor I: Context Is Everything) applies to any context window -- Validation gates (Factor V: Validate Externally) work with any LLM output -- Budget limits (Factor IX: Measure What Matters) track cost regardless of provider +- Validation gates (Factor VII: Validate Externally) work with any LLM output +- Budget limits (Factor XII: Measure Outcomes) track cost regardless of provider Implementation examples use Claude (ai-platform) and Anthropic patterns, but principles transfer to any model. diff --git a/docs/explanation/operator-model.md b/docs/explanation/operator-model.md index 01e6039..c8f6741 100644 --- a/docs/explanation/operator-model.md +++ b/docs/explanation/operator-model.md @@ -85,15 +85,15 @@ Governance does not do the work directly. It shapes the work so autonomy remains | **I. Context Is Everything** | Stateful environment | Keeps continuity in bounded, reloadable context instead of one overloaded session | | **II. Track Everything in Git** | Durable traces | Preserves memory, provenance, and resumability in versioned artifacts | | **III. One Agent, One Job** | Replaceable actors | Keeps workers scoped, swappable, and easy to restart | -| **IV. Research Before You Build** | Governance | Clarifies objective, constraints, and evidence before action | -| **V. Validate Externally** | Selection gates | Ensures the environment, not the author, decides what survives | -| **VI. Lock Progress Forward** | Selection gates | Ratchets accepted work into durable state so later work cannot quietly erase it | -| **VII. Extract Learnings** | Durable traces | Turns completed work into reusable evidence instead of lost experience | -| **VIII. Compound Knowledge** | Promotion loops | Feeds validated learnings back into future sessions so performance improves over time | -| **IX. Measure What Matters** | Governance | Keeps the system aligned to outcomes rather than activity theater | -| **X. Isolate Workers** | Replaceable actors | Prevents hidden coupling when multiple workers operate in parallel | +| **IV. Enforce Least Privilege** | Governance | Grants each actor only the access its job requires, bounding the blast radius of failure | +| **V. Research Before You Build** | Governance | Clarifies objective, constraints, and evidence before action | +| **VI. Isolate Workers** | Replaceable actors | Prevents hidden coupling when multiple workers operate in parallel | +| **VII. Validate Externally** | Selection gates | Ensures the environment, not the author, decides what survives | +| **VIII. Lock Progress Forward** | Selection gates | Ratchets accepted work into durable state so later work cannot quietly erase it | +| **IX. Extract Learnings** | Durable traces | Turns completed work into reusable evidence instead of lost experience | +| **X. Compound Knowledge** | Promotion loops | Feeds validated learnings — including failed attempts — back into future sessions so performance improves over time | | **XI. Supervise Hierarchically** | Governance | Makes escalation, coordination, and boundary-setting explicit | -| **XII. Harvest Failures as Wisdom** | Promotion loops | Converts failed attempts into preventative knowledge and better future choices | +| **XII. Measure Outcomes** | Governance | Keeps the system aligned to outcomes rather than activity theater | --- diff --git a/docs/explanation/pattern-heritage.md b/docs/explanation/pattern-heritage.md index 55dca34..d3d4143 100644 --- a/docs/explanation/pattern-heritage.md +++ b/docs/explanation/pattern-heritage.md @@ -82,7 +82,7 @@ Houston introduced explicit state machines for multi-phase work. │ │ COMPLETE │ │ FAILED │ │ ABORTED │ │ │ └──────────┘ └────────┘ └─────────┘ │ │ │ -│ INFORMED: Factor VI (Lock Progress Forward), Factor VIII (Compound Knowledge) │ +│ INFORMED: Factor VIII (Lock Progress Forward), Factor X (Compound Knowledge) │ └─────────────────────────────────────────────────────────────────────────┘ ``` @@ -114,7 +114,7 @@ class Mission: self.emit_event(StateChange(self.id, new_state)) ``` -**Factors informed:** VI (Lock Progress Forward), VIII (Compound Knowledge) +**Factors informed:** VIII (Lock Progress Forward), X (Compound Knowledge) --- @@ -155,7 +155,7 @@ class AtomicLock: **Why this matters:** No Redis, no Zookeeper, no network. Just filesystem semantics that work everywhere, including air-gapped environments. -**Factors informed:** XII (Harvest Failures as Wisdom) - works in any environment +**Factors informed:** X (Compound Knowledge) - works in any environment --- @@ -195,7 +195,7 @@ async def mission_events(mission_id: str): - Auto-reconnects on disconnect - No WebSocket complexity -**Factors informed:** V (Validate Externally), VIII (Compound Knowledge async gates) +**Factors informed:** VII (Validate Externally), X (Compound Knowledge async gates) --- @@ -232,7 +232,7 @@ Houston maximized throughput with N parallel workers + 1 initializer. └─────────────────────────────────────────────────────────────────────────┘ ``` -**Factors informed:** III (One Agent, One Job), VII (Extract Learnings) +**Factors informed:** III (One Agent, One Job), IX (Extract Learnings) --- @@ -390,7 +390,7 @@ status: - "Rate limit: 100 req/min" ``` -**Factors informed:** VI (Lock Progress Forward), III (One Agent, One Job) +**Factors informed:** VIII (Lock Progress Forward), III (One Agent, One Job) --- @@ -457,7 +457,7 @@ func (c *BudgetController) Reconcile(ctx context.Context, req ctrl.Request) (ctr } ``` -**Factors informed:** VIII (Compound Knowledge), XI (Supervise Hierarchically) +**Factors informed:** X (Compound Knowledge), XI (Supervise Hierarchically) --- @@ -533,7 +533,7 @@ spec: - "no breaking changes" ``` -**Factors informed:** VII (Extract Learnings) +**Factors informed:** IX (Extract Learnings) --- @@ -611,7 +611,7 @@ status: - Reversible actions can be rolled back - Compliance-ready logging -**Factors informed:** VIII (Compound Knowledge), V (Validate Externally) +**Factors informed:** X (Compound Knowledge), VII (Validate Externally) --- @@ -619,16 +619,16 @@ status: | Pattern | Source | Primary Factor | Supporting Factors | |---------|--------|----------------|-------------------| -| Mission Lifecycle State Machine | Houston | VI (Lock Progress Forward) | VIII (Compound Knowledge) | -| mkdir Atomic Locking | Houston | XII (Harvest Failures as Wisdom) | - | -| SSE Telemetry | Houston | V (Validate Externally) | VIII (Compound Knowledge) | -| N+1 Worker Pattern | Houston | III (One Agent, One Job) | VII (Extract Learnings) | +| Mission Lifecycle State Machine | Houston | VIII (Lock Progress Forward) | X (Compound Knowledge) | +| mkdir Atomic Locking | Houston | X (Compound Knowledge) | - | +| SSE Telemetry | Houston | VII (Validate Externally) | X (Compound Knowledge) | +| N+1 Worker Pattern | Houston | III (One Agent, One Job) | IX (Extract Learnings) | | PID-Based Crash Recovery | Houston | XI (Supervise Hierarchically) | - | -| Shard/ShardRun Separation | Fractal | VI (Lock Progress Forward) | III (One Agent, One Job) | -| BudgetQuota CRD | Fractal | VIII (Compound Knowledge) | XI (Supervise Hierarchically) | -| Blackboard Coordination | Fractal | VII (Extract Learnings) | III (One Agent, One Job) | -| Level-Triggered Reconciliation | Fractal | XI (Supervise Hierarchically) | IV (Research Before You Build) | -| ToolCall Audit Trail | Fractal | VIII (Compound Knowledge) | V (Validate Externally) | +| Shard/ShardRun Separation | Fractal | VIII (Lock Progress Forward) | III (One Agent, One Job) | +| BudgetQuota CRD | Fractal | X (Compound Knowledge) | XI (Supervise Hierarchically) | +| Blackboard Coordination | Fractal | IX (Extract Learnings) | III (One Agent, One Job) | +| Level-Triggered Reconciliation | Fractal | XI (Supervise Hierarchically) | V (Research Before You Build) | +| ToolCall Audit Trail | Fractal | X (Compound Knowledge) | VII (Validate Externally) | --- @@ -684,11 +684,10 @@ ai-platform combined Houston's simplicity with Fractal's Kubernetes-native appro - **From Theory to Production**: [./from-theory-to-production.md](./from-theory-to-production.md) - **The 12 Factors**: [../../factors/README.md](../../factors/README.md) - **Factor III Implementation Patterns**: [../../factors/03-one-agent-one-job.md#implementation-patterns](../../factors/03-one-agent-one-job.md#implementation-patterns) -- **Factor VI Implementation Patterns**: [../../factors/06-lock-progress-forward.md#implementation-patterns](../../factors/06-lock-progress-forward.md#implementation-patterns) -- **Factor VII Implementation Patterns**: [../../factors/07-extract-learnings.md#implementation-patterns](../../factors/07-extract-learnings.md#implementation-patterns) -- **Factor VIII Implementation Patterns**: [../../factors/08-compound-knowledge.md#implementation-patterns](../../factors/08-compound-knowledge.md#implementation-patterns) +- **Factor VIII Implementation Patterns**: [../../factors/08-lock-progress-forward.md#implementation-patterns](../../factors/08-lock-progress-forward.md#implementation-patterns) +- **Factor IX Implementation Patterns**: [../../factors/09-extract-learnings.md#implementation-patterns](../../factors/09-extract-learnings.md#implementation-patterns) +- **Factor X Implementation Patterns**: [../../factors/10-compound-knowledge.md#implementation-patterns](../../factors/10-compound-knowledge.md#implementation-patterns) - **Factor XI Implementation Patterns**: [../../factors/11-supervise-hierarchically.md#implementation-patterns](../../factors/11-supervise-hierarchically.md#implementation-patterns) -- **Factor XII Implementation Patterns**: [../../factors/12-harvest-failures-as-wisdom.md#implementation-patterns](../../factors/12-harvest-failures-as-wisdom.md#implementation-patterns) --- diff --git a/docs/explanation/standing-on-giants.md b/docs/explanation/standing-on-giants.md index 34c69f6..90be29b 100644 --- a/docs/explanation/standing-on-giants.md +++ b/docs/explanation/standing-on-giants.md @@ -49,7 +49,7 @@ This framework doesn't invent new principles. It **adapts proven operational pat **Mapping examples:** - Their Factor I (Codebase) → Our Factor II (Track Everything in Git) - Their Factor III (Config) → Our Factor I (Context Is Everything) -- Their Factor XI (Logs) → Our Factor IX (Measure What Matters) +- Their Factor XI (Logs) → Our Factor XII (Measure Outcomes) **Why this works:** Infrastructure operations and knowledge operations face similar problems: - Partial failures @@ -83,11 +83,11 @@ This framework doesn't invent new principles. It **adapts proven operational pat **Applied to AI workflows:** - **CI/CD** → Multi-layer validation gates (`make quick` → `make ci-all`) - **Validation gates** → Pre-commit hooks, human approval checkpoints -- **Monitoring** → Factor IX (Measure What Matters: metrics, logs, token usage) -- **Postmortems** → Factor XII (Harvest Failures as Wisdom: extract learnings from failures) -- **Gradual rollouts** → Factor X (Isolate Workers: incremental, independent execution) -- **Zero-trust** → Factor V (Validate Externally: never trust a single step) -- **Runbooks** → Factor VIII (Compound Knowledge: capture workflows as institutional memory) +- **Monitoring** → Factor XII (Measure Outcomes: metrics, logs, token usage) +- **Postmortems** → Factor X (Compound Knowledge: extract learnings from failures) +- **Gradual rollouts** → Factor VI (Isolate Workers: incremental, independent execution) +- **Zero-trust** → Factor VII (Validate Externally: never trust a single step) +- **Runbooks** → Factor X (Compound Knowledge: capture workflows as institutional memory) **Why this works:** AI agents exhibit the same failure modes as distributed infrastructure: - Partial failures (one tool call fails, rest must continue) @@ -209,7 +209,7 @@ Together: Complete Playbook | **Spaced Repetition** | Bundle maintenance: hot → warm → cold memory tiers | | **Chunking** | Factor III (One Agent, One Job: each does one job well) | | **Progressive Disclosure** | Thin kernels + JIT pointers (CLAUDE.md ~800 tokens) | -| **Deliberate Practice** | Factor VII (Extract Learnings) + Factor IX (Measure What Matters) | +| **Deliberate Practice** | Factor IX (Extract Learnings) + Factor XII (Measure Outcomes) | **Why this works:** AI context windows mirror human working memory: - Both have hard limits (7±2 items for humans, 200k tokens for AI) @@ -243,7 +243,7 @@ Together: Complete Playbook |-----------|-------------------| | **Context Switching (40% cost)** | 40% context budget (Factor I: Context Is Everything) | | **Information Architecture** | CLAUDE.md hierarchy (workspace → repo → task) | -| **Progressive Disclosure** | JIT loading (Factor VI: Lock Progress Forward) | +| **Progressive Disclosure** | JIT loading (Factor VIII: Lock Progress Forward) | | **Cognitive Overhead** | Factor III (One Agent, One Job: focused execution) | | **State Management** | Bundle system (compress 60k → 5k tokens) | @@ -386,7 +386,7 @@ Together: Complete Playbook --- -### 2. Sub-Agent Orchestration (Factor III + X) +### 2. Sub-Agent Orchestration (Factor III + VI) **Source:** DevOps microservices + separation of concerns **Discovery:** Fresh context per workflow phase prevents error accumulation @@ -395,7 +395,7 @@ Together: Complete Playbook --- -### 3. Bundle Compression System (Factor VI + VIII) +### 3. Bundle Compression System (Factor VIII + X) **Source:** Learning Science (spaced repetition) + Context Engineering (state management) **Discovery:** 12:1 compression ratio (60k tokens → 5k) enables multi-session continuity @@ -404,7 +404,7 @@ Together: Complete Playbook --- -### 4. Validation > Generation (Factor V) +### 4. Validation > Generation (Factor VII) **Source:** DevOps CI/CD + SRE validation gates **Discovery:** Pre-commit validation is 10x ROI vs post-commit fixes diff --git a/docs/explanation/three-developer-loops.md b/docs/explanation/three-developer-loops.md index 765e701..ff54880 100644 --- a/docs/explanation/three-developer-loops.md +++ b/docs/explanation/three-developer-loops.md @@ -104,7 +104,7 @@ git commit -m "Add validation logic" # AI can't silently break things ``` -#### Factor V: Validate Externally +#### Factor VII: Validate Externally **Maps to Inner Loop Detection:** - Independent verification of AI output @@ -124,7 +124,7 @@ def calculate_total(items): # You verify: Test actually passes ``` -#### Factor VI: Lock Progress Forward +#### Factor VIII: Lock Progress Forward **Maps to Inner Loop Correction:** - Checkpoint frequently so you can roll back @@ -149,7 +149,7 @@ pytest tests/ **Root cause:** AI hallucinates test results without running them -**Violated factor:** Factor V (Validate Externally) +**Violated factor:** Factor VII (Validate Externally) **Remedy:** - Always run tests independently @@ -175,7 +175,7 @@ pytest tests/ **Root cause:** AI doesn't understand problem, keeps guessing -**Violated factor:** Factor IV (Research Before You Build) +**Violated factor:** Factor V (Research Before You Build) **Remedy:** - Take manual control immediately @@ -277,7 +277,7 @@ Agent 3: Frontend components # No overlap, clear boundaries ``` -#### Factor X: Isolate Workers +#### Factor VI: Isolate Workers **Maps to Middle Loop Detection:** - Architecture constraints prevent eldritch horrors @@ -318,7 +318,7 @@ Agent 3: Frontend components **Root cause:** AI optimizes for "working" not "maintainable," no modularity constraints -**Violated factor:** Factor X (Isolate Workers) +**Violated factor:** Factor VI (Isolate Workers) **Remedy:** - STOP IMMEDIATELY - Do not proceed @@ -346,7 +346,7 @@ Agent 3: Frontend components **Root cause:** Poor task decomposition, circular dependencies -**Violated factor:** Factor X (Isolate Workers) + task decomposition +**Violated factor:** Factor VI (Isolate Workers) + task decomposition **Remedy:** - Break dependency cycle manually @@ -422,7 +422,7 @@ This is the loop of: ### AgentOps Implementation -#### Factor VI: Lock Progress Forward +#### Factor VIII: Lock Progress Forward **Maps to Outer Loop Prevention:** - Fast rollback when AI breaks things @@ -453,7 +453,7 @@ git revert [commit-sha] # Productivity maintained ``` -#### Factor XII: Harvest Failures as Wisdom +#### Factor X: Compound Knowledge **Maps to Outer Loop Correction:** - Learn from every production incident @@ -478,10 +478,10 @@ make api-compatibility-test **Root cause:** AI doesn't understand production impact of API changes -**Violated factor:** Factor V (Validate Externally) +**Violated factor:** Factor VII (Validate Externally) **Remedy:** -- Rollback immediately (Factor VI: Lock Progress Forward) +- Rollback immediately (Factor VIII: Lock Progress Forward) - Restore API compatibility - Add API compatibility tests before retrying - Implement contract testing @@ -560,9 +560,9 @@ Outer Loop (codify as org patterns) ### Diagnostic Use: "Which Loop Am I In?" **Ask yourself:** -- Am I coding a single function? → Inner Loop (use Factors II, V, VI) -- Am I coordinating multiple agents? → Middle Loop (use Factors I, III, X) -- Am I changing architecture/process? → Outer Loop (use Factors VI, XI, XII) +- Am I coding a single function? → Inner Loop (use Factors II, VII, VIII) +- Am I coordinating multiple agents? → Middle Loop (use Factors I, III, VI) +- Am I changing architecture/process? → Outer Loop (use Factors VIII, XI, X) **Example:** ``` @@ -608,7 +608,7 @@ Prevention for each loop planned upfront Failure: Production API broke after deployment Loop: Outer (production impact) -Violated factor: Factor V (Validate Externally) +Violated factor: Factor VII (Validate Externally) Correction: Rollback immediately Prevention: Add API compatibility tests to CI/CD ``` @@ -619,14 +619,14 @@ Prevention: Add API compatibility tests to CI/CD | Loop | Timescale | AgentOps Factors | Prevention | Detection | Correction | |------|-----------|-----------------|-----------|-----------|------------| -| **Inner** | Seconds-minutes | II, V, VI | Checkpoint frequently, TDD, git mastery | Verify AI claims, always on watch | Rollback, manual debugging | -| **Middle** | Hours-days | I, III, X | Written rules, memento method | Eldritch horror detection, CI/CD gates | Tracer bullets, workflow automation | -| **Outer** | Weeks-months | VI, XI, XII | Don't torch bridges, modularization | AI throws everything out, CI/CD | git reflog recovery, navigate legacy | +| **Inner** | Seconds-minutes | II, VII, VIII | Checkpoint frequently, TDD, git mastery | Verify AI claims, always on watch | Rollback, manual debugging | +| **Middle** | Hours-days | I, III, VI | Written rules, memento method | Eldritch horror detection, CI/CD gates | Tracer bullets, workflow automation | +| **Outer** | Weeks-months | VIII, XI, X | Don't torch bridges, modularization | AI throws everything out, CI/CD | git reflog recovery, navigate legacy | **Cross-Loop Factors:** -- Factor IV: Research Before You Build (applies to all loops) -- Factor VII: Extract Learnings (applies to all loops) -- Factor VIII: Compound Knowledge (applies to all loops) +- Factor V: Research Before You Build (applies to all loops) +- Factor IX: Extract Learnings (applies to all loops) +- Factor X: Compound Knowledge (applies to all loops) --- diff --git a/docs/explanation/vibe-coding-integration.md b/docs/explanation/vibe-coding-integration.md index a6b2b0e..78a431f 100644 --- a/docs/explanation/vibe-coding-integration.md +++ b/docs/explanation/vibe-coding-integration.md @@ -16,9 +16,9 @@ Vibe coding — the practice of collaborating with AI agents through natural lan ## How Operational Discipline Supports Vibe Coding -The 12 factors are organized into four tiers. Each tier addresses a different layer of what makes vibe coding reliable, from individual sessions to organizational scale. +The 12 factors are organized into four phases. Each phase addresses a different layer of what makes vibe coding reliable, from individual sessions to organizational scale. -### Tier 1: Foundation (Factors I-III) +### Phase 1: Prepare (Factors I-IV) **Making each session start strong and stay focused.** @@ -29,12 +29,13 @@ These factors address the most common vibe coding frustrations: the agent does n | [I. Context Is Everything](../../factors/01-context-is-everything.md) | The agent performs poorly because it lacks project context. Load relevant context deliberately — architecture docs, coding standards, recent changes — and agent output quality transforms without changing the model. | | [II. Track Everything in Git](../../factors/02-track-everything-in-git.md) | Sessions produce work that gets lost, overwritten, or cannot be rolled back. When every artifact — code, research, plans, decisions — lives in git, vibe coding sessions become recoverable and auditable. | | [III. One Agent, One Job](../../factors/03-one-agent-one-job.md) | A single agent juggling research, coding, testing, and review produces mediocre results across all of them. Scoping each agent to a clear responsibility makes vibe coding sessions predictable. | +| [IV. Enforce Least Privilege](../../factors/04-enforce-least-privilege.md) | An agent with broad, standing access can do broad, standing damage when it goes wrong. Granting each agent only the access its job requires — scoped tools, default-deny permissions — keeps mistakes small and recoverable. | **Without tooling:** You can apply these principles with nothing more than a well-structured CLAUDE.md file, a git repository, and discipline about what you ask each agent session to do. --- -### Tier 2: Workflow (Factors IV-VI) +### Phase 2: Bound (Factors V-VIII) **Making the work between sessions reliable.** @@ -42,15 +43,16 @@ These factors address what happens when vibe coding moves beyond a single quick | Factor | What It Solves | |--------|---------------| -| [IV. Research Before You Build](../../factors/04-research-before-you-build.md) | Jumping straight into implementation wastes sessions on wrong approaches. A brief research phase — reading existing code, checking constraints, exploring alternatives — makes the build phase dramatically more productive. | -| [V. Validate Externally](../../factors/05-validate-externally.md) | The agent says "tests pass" but the code does not compile. External validation — running tests independently, checking outputs against real systems, verifying claims outside the agent session — catches lies and hallucinations before they compound. | -| [VI. Lock Progress Forward](../../factors/06-lock-progress-forward.md) | A productive session's work gets undone by the next session. Commit working states frequently, tag milestones, and treat each validated checkpoint as a ratchet that prevents regression. | +| [V. Research Before You Build](../../factors/05-research-before-you-build.md) | Jumping straight into implementation wastes sessions on wrong approaches. A brief research phase — reading existing code, checking constraints, exploring alternatives — makes the build phase dramatically more productive. | +| [VI. Isolate Workers](../../factors/06-isolate-workers.md) | Multiple agents editing the same files create merge conflicts and corrupted state. Isolated workspaces — separate worktrees, branches, or directories — let parallel vibe coding sessions proceed without interference. | +| [VII. Validate Externally](../../factors/07-validate-externally.md) | The agent says "tests pass" but the code does not compile. External validation — running tests independently, checking outputs against real systems, verifying claims outside the agent session — catches lies and hallucinations before they compound. | +| [VIII. Lock Progress Forward](../../factors/08-lock-progress-forward.md) | A productive session's work gets undone by the next session. Commit working states frequently, tag milestones, and treat each validated checkpoint as a ratchet that prevents regression. | **Without tooling:** Commit after each working state. Run tests outside your agent session. Spend the first few minutes of a session reading before generating. These are habits, not tools. --- -### Tier 3: Knowledge (Factors VII-IX) +### Phase 3: Select (Factors IX-X) **Making each session smarter than the last.** @@ -58,29 +60,27 @@ This is where vibe coding transforms from a series of isolated sessions into a c | Factor | What It Solves | |--------|---------------| -| [VII. Extract Learnings](../../factors/07-extract-learnings.md) | Every session produces implicit knowledge — what worked, what failed, what the codebase actually does — that evaporates when the session ends. Deliberately extracting learnings turns ephemeral sessions into durable organizational knowledge. | -| [VIII. Compound Knowledge](../../factors/08-compound-knowledge.md) | Extracted learnings sit in a document nobody reads. A deliberate cycle of Harvest, Evaluate, Refine, and Operationalize (HERO) turns raw learnings into context that automatically improves future sessions. This is the factor that makes vibe coding a compounding investment rather than a flat cost. | -| [IX. Measure What Matters](../../factors/09-measure-what-matters.md) | You cannot tell if your vibe coding practice is improving. Tracking meaningful metrics — success rates, time-to-working-state, knowledge reuse — reveals whether your operational changes are actually helping. | +| [IX. Extract Learnings](../../factors/09-extract-learnings.md) | Every session produces implicit knowledge — what worked, what failed, what the codebase actually does — that evaporates when the session ends. Deliberately extracting learnings turns ephemeral sessions into durable organizational knowledge. | +| [X. Compound Knowledge](../../factors/10-compound-knowledge.md) | Extracted learnings sit in a document nobody reads. A deliberate cycle of Harvest, Evaluate, Refine, and Operationalize (HERO) turns raw learnings — including failed sessions — into context that automatically improves future sessions. This is the factor that makes vibe coding a compounding investment rather than a flat cost. | **Without tooling:** After each session, write down what you learned in a file that your next session will read. Review those notes weekly. Delete what is stale. Promote what keeps being useful. This is the knowledge flywheel in its simplest form. -**Factor VIII is the differentiator.** Most vibe coding advice focuses on prompting techniques for a single session. Compound Knowledge addresses the harder problem: making every session build on everything that came before. An organization that compounds knowledge across hundreds of agent sessions operates at a fundamentally different level than one that starts fresh each time. +**Factor X is the differentiator.** Most vibe coding advice focuses on prompting techniques for a single session. Compound Knowledge addresses the harder problem: making every session build on everything that came before. An organization that compounds knowledge across hundreds of agent sessions operates at a fundamentally different level than one that starts fresh each time. --- -### Tier 4: Scale (Factors X-XII) — The Factory Altitude +### Phase 4: Govern (Factors XI-XII) — The Factory Altitude **Making vibe coding work across teams and complex systems.** -These are the same three factors at fleet scale. A solo developer already lives them in miniature — a git worktree is isolation, your own judgment is supervision, the note you write after a failed session is failure harvesting. When vibe coding scales beyond one person and one agent, the same rules need real machinery. You grow into this altitude; you don't skip the factors. +A solo developer already lives these in miniature — your own judgment is supervision, the metric you track is whether last week was better than this one. When vibe coding scales beyond one person and one agent, the same rules need real machinery. You grow into this altitude; you don't skip the factors. | Factor | What It Solves | |--------|---------------| -| [X. Isolate Workers](../../factors/10-isolate-workers.md) | Multiple agents editing the same files create merge conflicts and corrupted state. Isolated workspaces — separate worktrees, branches, or directories — let parallel vibe coding sessions proceed without interference. | | [XI. Supervise Hierarchically](../../factors/11-supervise-hierarchically.md) | A fleet of agents with no coordination produces duplicated work and conflicting changes. A supervisory layer — whether a lead agent, a human coordinator, or a dispatch system — keeps parallel sessions aligned. | -| [XII. Harvest Failures as Wisdom](../../factors/12-harvest-failures-as-wisdom.md) | Failed sessions feel like wasted time. When failures are systematically analyzed — what went wrong, what context was missing, what validation would have caught it — they become the most valuable input to the knowledge flywheel. | +| [XII. Measure Outcomes](../../factors/12-measure-outcomes.md) | You cannot tell if your vibe coding practice is improving. Tracking meaningful metrics — success rates, time-to-working-state, knowledge reuse — reveals whether your operational changes are actually helping. | -**Without tooling:** Use separate git branches for parallel work. Designate one person to coordinate when multiple developers are vibe coding on the same codebase. When a session fails, write a brief note about why before starting over. +**Without tooling:** Designate one person to coordinate when multiple developers are vibe coding on the same codebase. Track a simple metric — how often sessions succeed first try — so you know whether your practice is actually improving. --- @@ -111,15 +111,15 @@ The difference compounds. After 10 sessions, the gap is noticeable. After 100 se | Agent produces wrong approach | Missing project context | [I. Context Is Everything](../../factors/01-context-is-everything.md) | | Work from a good session gets lost | No checkpoint discipline | [II. Track Everything in Git](../../factors/02-track-everything-in-git.md) | | Agent tries to do everything at once | Unclear scope | [III. One Agent, One Job](../../factors/03-one-agent-one-job.md) | -| Implementation goes in circles | No research phase | [IV. Research Before You Build](../../factors/04-research-before-you-build.md) | -| Agent claims success but code is broken | No external validation | [V. Validate Externally](../../factors/05-validate-externally.md) | -| Next session undoes previous progress | No progress locking | [VI. Lock Progress Forward](../../factors/06-lock-progress-forward.md) | -| Same mistakes repeat across sessions | No learning extraction | [VII. Extract Learnings](../../factors/07-extract-learnings.md) | -| Learnings exist but nobody uses them | No compounding system | [VIII. Compound Knowledge](../../factors/08-compound-knowledge.md) | -| Cannot tell if practice is improving | No measurement | [IX. Measure What Matters](../../factors/09-measure-what-matters.md) | -| Parallel sessions create conflicts | No workspace isolation | [X. Isolate Workers](../../factors/10-isolate-workers.md) | +| Agent has more access than its job needs | No privilege scoping | [IV. Enforce Least Privilege](../../factors/04-enforce-least-privilege.md) | +| Implementation goes in circles | No research phase | [V. Research Before You Build](../../factors/05-research-before-you-build.md) | +| Parallel sessions create conflicts | No workspace isolation | [VI. Isolate Workers](../../factors/06-isolate-workers.md) | +| Agent claims success but code is broken | No external validation | [VII. Validate Externally](../../factors/07-validate-externally.md) | +| Next session undoes previous progress | No progress locking | [VIII. Lock Progress Forward](../../factors/08-lock-progress-forward.md) | +| Same mistakes repeat across sessions | No learning extraction | [IX. Extract Learnings](../../factors/09-extract-learnings.md) | +| Learnings exist but nobody uses them; failed sessions feel like waste | No compounding system | [X. Compound Knowledge](../../factors/10-compound-knowledge.md) | | Multiple agents duplicate or conflict | No coordination layer | [XI. Supervise Hierarchically](../../factors/11-supervise-hierarchically.md) | -| Failed sessions feel like waste | No failure harvesting | [XII. Harvest Failures as Wisdom](../../factors/12-harvest-failures-as-wisdom.md) | +| Cannot tell if practice is improving | No measurement | [XII. Measure Outcomes](../../factors/12-measure-outcomes.md) | For a detailed catalog of failure patterns and remedies, see the [failure patterns reference](../reference/failure-patterns.md). @@ -129,23 +129,23 @@ For a detailed catalog of failure patterns and remedies, see the [failure patter ### For Individual Developers -Start with the Foundation tier. These three factors — context loading, git discipline, and focused agent scope — produce the most immediate improvement in vibe coding session quality. They cost nothing to implement and work with any tool. +Start with the Prepare phase. These factors — context loading, git discipline, focused agent scope, and least-privilege access — produce the most immediate improvement in vibe coding session quality. They cost nothing to implement and work with any tool. -Then add the Workflow tier as you take on larger tasks. Research before building, validate externally, and lock progress forward. These habits prevent the most common session failures. +Then add the Bound phase as you take on larger tasks. Research before building, isolate parallel work, validate externally, and lock progress forward. These habits prevent the most common session failures. -The Knowledge tier is where long-term advantage emerges. Even a simple practice of writing down what you learned after each session, and loading those notes into the next one, creates a compounding effect that transforms your practice over weeks and months. +The Select phase is where long-term advantage emerges. Even a simple practice of writing down what you learned after each session, and loading those notes into the next one, creates a compounding effect that transforms your practice over weeks and months. ### For Teams -Everything above applies, plus the Scale tier. Isolated workspaces prevent parallel sessions from colliding. Hierarchical supervision keeps multiple developers' agent work aligned. And harvesting failures across the team means everyone benefits from each person's hard-won lessons. +Everything above applies, plus the Govern phase. Hierarchical supervision keeps multiple developers' agent work aligned, and measuring outcomes tells you whether the practice is improving across the team. -The Knowledge tier becomes especially powerful at team scale. When one developer discovers that a particular codebase requires a specific context-loading pattern, that learning can compound into every team member's future sessions through shared knowledge artifacts. +The Select phase becomes especially powerful at team scale. When one developer discovers that a particular codebase requires a specific context-loading pattern, that learning can compound into every team member's future sessions through shared knowledge artifacts. ### For Organizations The 12 factors provide a shared vocabulary for discussing agent operations. Instead of ad-hoc "tips and tricks" for prompting, teams can reason about which factors they are strong or weak on and invest accordingly. -Factor VIII (Compound Knowledge) is the organizational strategic advantage. Organizations that systematically compound knowledge across hundreds of agent sessions across dozens of developers build a durable asset that no model upgrade or tool switch can replicate. +Factor X (Compound Knowledge) is the organizational strategic advantage. Organizations that systematically compound knowledge across hundreds of agent sessions across dozens of developers build a durable asset that no model upgrade or tool switch can replicate. --- @@ -155,7 +155,7 @@ Factor VIII (Compound Knowledge) is the organizational strategic advantage. Orga 2. **Understand the failure patterns** - [Failure patterns reference](../reference/failure-patterns.md) 3. **See the full factor list** - [All 12 factors](../../factors/README.md) 4. **Try the workflow** - [Getting started guide](../getting-started/quick-start.md) -5. **Understand the knowledge flywheel** - [Compound Knowledge (Factor VIII)](../../factors/08-compound-knowledge.md) +5. **Understand the knowledge flywheel** - [Compound Knowledge (Factor X)](../../factors/10-compound-knowledge.md) --- diff --git a/docs/getting-started/quick-start.md b/docs/getting-started/quick-start.md index ad7fee0..4860664 100644 --- a/docs/getting-started/quick-start.md +++ b/docs/getting-started/quick-start.md @@ -45,7 +45,7 @@ git config --local commit.template .gitmessage --- -## Step 2: Add Validation Gates (Factor V: Validate Externally) +## Step 2: Add Validation Gates (Factor VII: Validate Externally) **Time:** 5 minutes @@ -54,7 +54,7 @@ git config --local commit.template .gitmessage ```makefile .PHONY: quick validate -# Factor V: Validate Externally +# Factor VII: Validate Externally quick: @echo "🔍 Running quick validation..." @# Syntax check @@ -120,7 +120,7 @@ You: Commit git add fibonacci.py git commit -m "feat: add fibonacci function -Factors used: II (Track Everything in Git), V (Validate Externally) +Factors used: II (Track Everything in Git), VII (Validate Externally) Success rate: 100% Time saved: ~10 min (vs manual validation)" ``` @@ -157,7 +157,7 @@ cat > METRICS.md < METRICS.md <> METRICS.md --- -#### Factor VI: Lock Progress Forward +#### Factor VIII: Lock Progress Forward **Goal:** Pick up where you left off (multi-day projects) @@ -397,7 +397,7 @@ cat .sessions/2025-11-25-auth-feature.md --- -#### Factor VII: Extract Learnings +#### Factor IX: Extract Learnings **Goal:** Reuse what works by extracting and compounding knowledge @@ -465,13 +465,13 @@ Time: 5 min (was 30 min) - Git tracking every decision (II: Track Everything in Git) - Focused prompts, one task per session (III: One Agent, One Job) -**Workflow (Factors V-VII):** -- Validation gates (V: Validate Externally) -- Session notes for resuming work (VI: Lock Progress Forward) -- Pattern library for reuse (VII: Extract Learnings) +**Workflow (Factors VII-VIII):** +- Validation gates (VII: Validate Externally) +- Session notes for resuming work (VIII: Lock Progress Forward) -**Knowledge (Factor IX):** -- Metrics tracking success rate (IX: Measure What Matters) +**Knowledge (Factors IX, XII):** +- Pattern library for reuse (IX: Extract Learnings) +- Metrics tracking success rate (XII: Measure Outcomes) ✅ **Measured results:** - 30-35% → 90-95% success rate @@ -508,7 +508,7 @@ Steps: 1. AI generates migration (Factor I: Context Is Everything) 2. make quick validates syntax (5s) 3. make test runs migration tests (30s) -4. Automated deployment pipeline (Factor V: Validate Externally) +4. Automated deployment pipeline (Factor VII: Validate Externally) 5. Health checks pass automatically (45s) Success rate: 100% @@ -559,7 +559,7 @@ integration: test **Diagnosis:** Not reusing patterns (starting from scratch each time) -**Fix:** Extract and reuse learnings (Factor VII) +**Fix:** Extract and reuse learnings (Factor IX) ```bash # After successful task cp working-code.py .patterns/[pattern-name].py @@ -577,11 +577,11 @@ cp working-code.py .patterns/[pattern-name].py ### Option A: Add More Factors (Full 12) **Implement remaining factors:** -- IV: Research Before You Build (research before coding) -- VIII: Compound Knowledge (build institutional memory) -- X: Isolate Workers (independent execution environments) +- IV: Enforce Least Privilege (scope agent permissions) +- V: Research Before You Build (research before coding) +- VI: Isolate Workers (independent execution environments) +- X: Compound Knowledge (build institutional memory, learn from what goes wrong) - XI: Supervise Hierarchically (coordination at scale) -- XII: Harvest Failures as Wisdom (learn from what goes wrong) **Guide:** [Complete Workflow Guide](../tutorials/workflow-guide.md) @@ -609,10 +609,10 @@ cp working-code.py .patterns/[pattern-name].py - ✅ I: Context Is Everything - ✅ II: Track Everything in Git - ✅ III: One Agent, One Job -- ✅ V: Validate Externally -- ✅ VI: Lock Progress Forward -- ✅ VII: Extract Learnings -- ✅ IX: Measure What Matters +- ✅ VII: Validate Externally +- ✅ VIII: Lock Progress Forward +- ✅ IX: Extract Learnings +- ✅ XII: Measure Outcomes **FAAFO achieved:** - ✅ Fast: 2.7-10x speedup diff --git a/docs/how-to/README.md b/docs/how-to/README.md index 8560e9c..76803a8 100644 --- a/docs/how-to/README.md +++ b/docs/how-to/README.md @@ -16,14 +16,14 @@ Choose the task you want to accomplish: ## Research & Planning -- **Structured Research** - Use Research-Plan-Implement phasing before writing code (Factor IV: Research Before You Build) +- **Structured Research** - Use Research-Plan-Implement phasing before writing code (Factor V: Research Before You Build) - **Multi-Phase Workflows** - Break complex work into focused phases with clear handoffs --- ## Validation & Quality -- **External Validation** - Set up make quick/test/all pipelines for automated checks (Factor V: Validate Externally) +- **External Validation** - Set up make quick/test/all pipelines for automated checks (Factor VII: Validate Externally) - **Pre-Commit Hooks** - Automate validation before every commit - **Security Scans** - Integrate security checks into your workflow @@ -31,7 +31,7 @@ Choose the task you want to accomplish: ## Progress & Session Management -- **Lock Progress Forward** - Commit incrementally so work is never lost (Factor VI: Lock Progress Forward) +- **Lock Progress Forward** - Commit incrementally so work is never lost (Factor VIII: Lock Progress Forward) - **Session Notes** - Capture context for multi-day project continuity - **Git Workflow** - Use git as your institutional memory (Factor II: Track Everything in Git) @@ -39,17 +39,17 @@ Choose the task you want to accomplish: ## Knowledge & Learning -- **Extract Learnings** - Turn session outcomes into reusable knowledge (Factor VII: Extract Learnings) -- **Build a Knowledge Base** - Compound knowledge across sessions using HERO pattern (Factor VIII: Compound Knowledge) -- **Track What Matters** - Measure success rates, speedup, and operational health (Factor IX: Measure What Matters) +- **Extract Learnings** - Turn session outcomes into reusable knowledge (Factor IX: Extract Learnings) +- **Build a Knowledge Base** - Compound knowledge across sessions using HERO pattern (Factor X: Compound Knowledge) +- **Track What Matters** - Measure success rates, speedup, and operational health (Factor XII: Measure Outcomes) --- ## Scale (Multi-Agent) -- **Isolate Workers** - Give each agent its own worktree and environment (Factor X: Isolate Workers) +- **Isolate Workers** - Give each agent its own worktree and environment (Factor VI: Isolate Workers) - **Hierarchical Supervision** - Set up supervisors to manage agent fleets (Factor XI: Supervise Hierarchically) -- **Harvest Failures** - Turn failures into documented wisdom that prevents recurrence (Factor XII: Harvest Failures as Wisdom) +- **Harvest Failures** - Turn failures into documented wisdom that prevents recurrence (Factor X: Compound Knowledge) --- @@ -70,15 +70,15 @@ Find how-to guides organized by which factor they support: | **Foundation** | **I: Context Is Everything** | Context file setup, prevent collapse | | | **II: Track Everything in Git** | Git workflow, commit templates | | | **III: One Agent, One Job** | Scope agent work, single-task sessions | -| **Workflow** | **IV: Research Before You Build** | Structured research, multi-phase workflows | -| | **V: Validate Externally** | Validation gates, pre-commit hooks, security scans | -| | **VI: Lock Progress Forward** | Session notes, incremental commits | -| **Knowledge** | **VII: Extract Learnings** | Post-session extraction, pattern capture | -| | **VIII: Compound Knowledge** | HERO pattern, knowledge base setup | -| | **IX: Measure What Matters** | Success tracking, speedup measurement | -| **Scale** | **X: Isolate Workers** | Worker isolation, dedicated worktrees | -| | **XI: Supervise Hierarchically** | Supervisor setup, fleet management | -| | **XII: Harvest Failures as Wisdom** | Failure documentation, prevention patterns | +| | **IV: Enforce Least Privilege** | Scoped permissions, minimal agent access | +| **Workflow** | **V: Research Before You Build** | Structured research, multi-phase workflows | +| | **VI: Isolate Workers** | Worker isolation, dedicated worktrees | +| | **VII: Validate Externally** | Validation gates, pre-commit hooks, security scans | +| | **VIII: Lock Progress Forward** | Session notes, incremental commits | +| **Knowledge** | **IX: Extract Learnings** | Post-session extraction, pattern capture | +| | **X: Compound Knowledge** | HERO pattern, knowledge base setup, failure documentation, prevention patterns | +| **Scale** | **XI: Supervise Hierarchically** | Supervisor setup, fleet management | +| | **XII: Measure Outcomes** | Success tracking, speedup measurement | --- @@ -88,9 +88,9 @@ Find how-to guides organized by which factor they support: 1. **New to 12-Factor AgentOps?** -- Read [Getting Started](../getting-started/) first, then set up a context file 2. **Context collapse issues?** -- Focus on Factor I (context budget) and Factor III (agent focus) -3. **Low success rate?** -- Add external validation (Factor V) and research phasing (Factor IV) -4. **Knowledge keeps getting lost?** -- Extract learnings (Factor VII) and compound them (Factor VIII) -5. **Scaling to multiple agents?** -- Start with isolation (Factor X), then add supervision (Factor XI) +3. **Low success rate?** -- Add external validation (Factor VII) and research phasing (Factor V) +4. **Knowledge keeps getting lost?** -- Extract learnings (Factor IX) and compound them (Factor X) +5. **Scaling to multiple agents?** -- Start with isolation (Factor VI), then add supervision (Factor XI) --- diff --git a/docs/principles/README.md b/docs/principles/README.md index 054dd7b..4d90ca9 100644 --- a/docs/principles/README.md +++ b/docs/principles/README.md @@ -10,7 +10,7 @@ This directory contains deep dives into the core concepts that underpin the fram Twelve vendor-neutral principles organized in four tiers. Each tier builds on the previous one. You can stop at any tier and keep the value. -### Foundation (I-III) -- Non-negotiable basics +### Prepare (I-III) -- Non-negotiable basics | # | Factor | The Rule | |---|--------|----------| @@ -18,25 +18,23 @@ Twelve vendor-neutral principles organized in four tiers. Each tier builds on th | **[II](../../factors/02-track-everything-in-git.md)** | **Track Everything in Git** | If it's not in git, it didn't happen. | | **[III](../../factors/03-one-agent-one-job.md)** | **One Agent, One Job** | Each agent gets a scoped task and fresh context. Never reuse a saturated window. | -### Workflow (IV-VI) -- The discipline that separates prompting from operating +### Bound (IV-VI) -- The discipline that separates prompting from operating | # | Factor | The Rule | |---|--------|----------| -| **[IV](../../factors/04-research-before-you-build.md)** | **Research Before You Build** | Understand the problem space before generating a single line of code. | -| **[V](../../factors/05-validate-externally.md)** | **Validate Externally** | The worker reports evidence; an independent checker writes the binding verdict. No agent grades its own work. | -| **[VI](../../factors/06-lock-progress-forward.md)** | **Lock Progress Forward** | Once work passes validation, it ratchets -- it cannot regress. | +| **[IV](../../factors/04-enforce-least-privilege.md)** | **Enforce Least Privilege** | An agent acts inside a least-privilege envelope it cannot widen -- not even on untrusted input. | +| **[V](../../factors/05-research-before-you-build.md)** | **Research Before You Build** | Understand the problem space before generating a single line of code. | +| **[VI](../../factors/06-isolate-workers.md)** | **Isolate Workers** | Each worker gets its own workspace, its own context, and zero shared mutable state. | -### Knowledge (VII-IX) -- Where compounding kicks in +### Select (VII-IX) -- Where compounding kicks in | # | Factor | The Rule | |---|--------|----------| -| **[VII](../../factors/07-extract-learnings.md)** | **Extract Learnings** | Every session produces two outputs -- the work product and the lessons learned. | -| **[VIII](../../factors/08-compound-knowledge.md)** | **Compound Knowledge** | Learnings must flow back into future sessions automatically. | -| **[IX](../../factors/09-measure-what-matters.md)** | **Measure What Matters** | Track fitness toward goals, not activity metrics. | +| **[VII](../../factors/07-validate-externally.md)** | **Validate Externally** | The worker reports evidence; an independent checker writes the binding verdict. No agent grades its own work. | +| **[VIII](../../factors/08-lock-progress-forward.md)** | **Lock Progress Forward** | Once work passes validation, it ratchets -- it cannot regress. | +| **[IX](../../factors/09-extract-learnings.md)** | **Extract Learnings** | Every session produces two outputs -- the work product and the lessons learned. | -**Factor VIII is the hero.** It is the knowledge flywheel: extract learnings, gate for quality, inject into future sessions, measure retrieval, let stale knowledge decay. This is the differentiator that no amount of model improvement replaces -- better models with amnesia still repeat your mistakes. - -### Scale (X-XII) -- The Factory Altitude +### Govern (X-XII) -- The Factory Altitude The same factors at fleet scale. Working solo, you live them in miniature -- a git worktree is isolation, your own judgment is supervision, your `learnings.md` @@ -45,9 +43,11 @@ factors. | # | Factor | The Rule | |---|--------|----------| -| **[X](../../factors/10-isolate-workers.md)** | **Isolate Workers** | Each worker gets its own workspace, its own context, and zero shared mutable state. | +| **[X](../../factors/10-compound-knowledge.md)** | **Compound Knowledge** | Learnings must flow back into future sessions automatically; turn dead ends into routing hints that prune the next agent's search. | | **[XI](../../factors/11-supervise-hierarchically.md)** | **Supervise Hierarchically** | Escalation flows up, never sideways. | -| **[XII](../../factors/12-harvest-failures-as-wisdom.md)** | **Harvest Failures as Wisdom** | Turn dead ends into routing hints that prune the next agent's search. | +| **[XII](../../factors/12-measure-outcomes.md)** | **Measure Outcomes** | Track fitness toward goals, not activity metrics. | + +**Factor X is the hero.** It is the knowledge flywheel: extract learnings, gate for quality, inject into future sessions, measure retrieval, let stale knowledge decay. This is the differentiator that no amount of model improvement replaces -- better models with amnesia still repeat your mistakes. --- @@ -87,32 +87,32 @@ Both human cognition and AI context windows show catastrophic performance degrad **Read time:** 15 minutes **When to read:** Understanding git as institutional memory -Git is not just version control -- it is the operating system for institutional knowledge. Commits as memory writes, branches as process isolation, merges as knowledge integration, history as audit trail. Directly supports Factor II (Track Everything in Git) and Factor VIII (Compound Knowledge). +Git is not just version control -- it is the operating system for institutional knowledge. Commits as memory writes, branches as process isolation, merges as knowledge integration, history as audit trail. Directly supports Factor II (Track Everything in Git) and Factor X (Compound Knowledge). --- ## The Knowledge Flywheel -The central mechanism of 12-Factor AgentOps. Every factor contributes to it; Factor VIII (Compound Knowledge) is its beating heart. +The central mechanism of 12-Factor AgentOps. Every factor contributes to it; Factor X (Compound Knowledge) is its beating heart. ``` - Extract (Factor VII) + Extract (Factor IX) | v Session --> Learnings --> Quality Gate | v - Measure (Factor IX) <-- Knowledge Base --> Inject (Factor I) + Measure (Factor XII) <-- Knowledge Base --> Inject (Factor I) | | v v Decay / Prune Next Session (smarter) ``` **The cycle:** -1. **Extract** -- Every session produces learnings alongside work product (Factor VII) -2. **Gate** -- Learnings pass quality checks before entering the knowledge base (Factor V) +1. **Extract** -- Every session produces learnings alongside work product (Factor IX) +2. **Gate** -- Learnings pass quality checks before entering the knowledge base (Factor VII) 3. **Inject** -- Future sessions load relevant knowledge just-in-time (Factor I) -4. **Measure** -- Track whether injected knowledge improves outcomes (Factor IX) +4. **Measure** -- Track whether injected knowledge improves outcomes (Factor XII) 5. **Decay** -- Stale knowledge loses priority; wrong knowledge gets pruned This is the differentiator that cannot be commoditized. Better models do not replace institutional memory. diff --git a/docs/principles/comparison-table.md b/docs/principles/comparison-table.md index f609582..5adf0a2 100644 --- a/docs/principles/comparison-table.md +++ b/docs/principles/comparison-table.md @@ -4,26 +4,35 @@ Three frameworks, three layers of the stack. This document maps each original 12 - **12-Factor App** (Heroku, 2011): How to build cloud-native applications - **12-Factor Agents** (Dex Horthy, 2025): How to build reliable AI applications -- **12-Factor AgentOps** (v3, 2026): The operational discipline for working with AI agents +- **12-Factor AgentOps** (v4, 2026): The operational discipline for working with AI agents --- ## Quick Reference -| # | 12-Factor App (2011) | 12-Factor Agents (2025) | 12-Factor AgentOps v3 (2026) | -|---|----------------------|-------------------------|------------------------------| -| I | Codebase | Own your prompts | **[Context Is Everything](../../factors/01-context-is-everything.md)** | -| II | Dependencies | Own your context window | **[Track Everything in Git](../../factors/02-track-everything-in-git.md)** | -| III | Config | Tools as structured outputs | **[One Agent, One Job](../../factors/03-one-agent-one-job.md)** | -| IV | Backing Services | Small, focused agents | **[Research Before You Build](../../factors/04-research-before-you-build.md)** | -| V | Build/Release/Run | Launch/Pause/Resume APIs | **[Validate Externally](../../factors/05-validate-externally.md)** | -| VI | Processes | Stateless reducer | **[Lock Progress Forward](../../factors/06-lock-progress-forward.md)** | -| VII | Port Binding | Trigger from anywhere | **[Extract Learnings](../../factors/07-extract-learnings.md)** | -| VIII | Concurrency | Small, focused agents | **[Compound Knowledge](../../factors/08-compound-knowledge.md)** | -| IX | Disposability | Launch/Pause/Resume | **[Measure What Matters](../../factors/09-measure-what-matters.md)** | -| X | Dev/Prod Parity | Implicit | **[Isolate Workers](../../factors/10-isolate-workers.md)** | -| XI | Logs | Compact errors into context | **[Supervise Hierarchically](../../factors/11-supervise-hierarchically.md)** | -| XII | Admin Processes | Contact humans with tools | **[Harvest Failures as Wisdom](../../factors/12-harvest-failures-as-wisdom.md)** | +> **Read this table by column, not by row.** Each framework is numbered on its own +> terms. The Heroku and Agents columns are listed against the closest AgentOps +> ancestor as a *conceptual lineage*, not a 1:1 position match. After the v4 +> renumbering (which added **IV: Enforce Least Privilege** and folded "Harvest +> Failures as Wisdom" into **X: Compound Knowledge**), the AgentOps numerals no +> longer line up with Heroku's -- e.g. AgentOps IX is now Extract Learnings, not +> a descendant of Heroku IX (Disposability). The rows below are ordered by the +> **AgentOps v4** numeral. + +| AgentOps # | 12-Factor AgentOps v4 (2026) | Closest Heroku ancestor (2011) | Closest Agents ancestor (2025) | +|---|------------------------------|--------------------------------|--------------------------------| +| I | **[Context Is Everything](../../factors/01-context-is-everything.md)** | I Codebase | Own your prompts | +| II | **[Track Everything in Git](../../factors/02-track-everything-in-git.md)** | II Dependencies | Own your context window | +| III | **[One Agent, One Job](../../factors/03-one-agent-one-job.md)** | III Config | Tools as structured outputs | +| IV | **[Enforce Least Privilege](../../factors/04-enforce-least-privilege.md)** | III Config / Zero-trust lineage | (new -- no direct Agents ancestor) | +| V | **[Research Before You Build](../../factors/05-research-before-you-build.md)** | IV Backing Services | Small, focused agents | +| VI | **[Isolate Workers](../../factors/06-isolate-workers.md)** | X Dev/Prod Parity | Implicit | +| VII | **[Validate Externally](../../factors/07-validate-externally.md)** | V Build/Release/Run | Launch/Pause/Resume APIs | +| VIII | **[Lock Progress Forward](../../factors/08-lock-progress-forward.md)** | VI Processes | Stateless reducer | +| IX | **[Extract Learnings](../../factors/09-extract-learnings.md)** | VII Port Binding | Trigger from anywhere | +| X | **[Compound Knowledge](../../factors/10-compound-knowledge.md)** | VIII Concurrency + XII Admin Processes | Small, focused agents; Contact humans with tools | +| XI | **[Supervise Hierarchically](../../factors/11-supervise-hierarchically.md)** | XI Logs | Compact errors into context | +| XII | **[Measure Outcomes](../../factors/12-measure-outcomes.md)** | IX Disposability | Launch/Pause/Resume | --- @@ -47,6 +56,11 @@ This compression does not replace the factors. It explains the mechanism beneath ## Detailed Comparison +> The headings below are organized by the **Heroku** factor (its 2011 numeral), +> tracing each one forward to its closest AI-age descendant. The AgentOps numeral +> shown for each is the **v4** numeral, which does not match the Heroku numeral -- +> the mapping is conceptual lineage, not a position match. + ### Factor I: Codebase / Prompts / Context Is Everything **Original 12-Factor App (I: Codebase)** @@ -107,6 +121,22 @@ This compression does not replace the factors. It explains the mechanism beneath --- +### New in v4: AgentOps IV: Enforce Least Privilege + +This factor is **new in AgentOps v4** -- it has no Heroku numeral of its own and no direct 12-Factor Agents ancestor. Its closest lineage is Heroku III (Config: keep secrets and environment-specific settings out of the codebase) extended through the **Zero-Trust** architecture tradition that also underpins Factor VII (Validate Externally). + +**Closest Heroku ancestor (III: Config / Zero-trust lineage)** +- *Principle*: Store config (and secrets) in the environment, not the codebase +- *Why it's the ancestor*: Both are about bounding what a component can reach -- Config externalizes credentials; Least Privilege bounds what those credentials can do + +**12-Factor AgentOps (IV: Enforce Least Privilege)** +- *Evolution*: Grant each agent the minimum permissions, scopes, and blast radius required for its task -- and no more +- *Why Different*: An agent with broad credentials can cause broad damage from a single bad inference; capability must be scoped to intent +- *Key Practice*: Scope tokens, file access, and tool permissions per task; default-deny destructive operations; make irreversible actions require explicit approval +- *Unique Aspect*: Applies zero-trust to agent *capability*, not just agent *output* -- the complement to Factor VII's zero-trust on verdicts + +--- + ### Factor IV: Backing Services / Small Agents / Research Before You Build **Original 12-Factor App (IV: Backing Services)** @@ -119,7 +149,7 @@ This compression does not replace the factors. It explains the mechanism beneath - *Why Changed*: Large agents get stuck at 70-80% quality - *Key Practice*: One agent, one well-defined responsibility -**12-Factor AgentOps (IV: Research Before You Build)** +**12-Factor AgentOps (V: Research Before You Build)** - *Evolution*: Understand the problem space before generating code - *Why Different*: Agents that skip research produce plausible but wrong solutions - *Key Practice*: Separate research phase from implementation phase; understand before generating @@ -139,7 +169,7 @@ This compression does not replace the factors. It explains the mechanism beneath - *Why Changed*: AI workflows need to pause/resume across sessions - *Key Practice*: Simple APIs for agent lifecycle management -**12-Factor AgentOps (V: Validate Externally)** +**12-Factor AgentOps (VII: Validate Externally)** - *Evolution*: The worker reports evidence; an independent checker writes the binding verdict. No agent grades its own work. - *Why Different*: Agents are confident but not reliable -- they cannot objectively evaluate their own output - *Key Practice*: The worker emits claims plus evidence; an independent checker -- tests, linters, a different agent, or a human reviewer -- is the sole writer of the binding verdict @@ -159,7 +189,7 @@ This compression does not replace the factors. It explains the mechanism beneath - *Why Changed*: Makes agents reproducible and testable - *Key Practice*: Agent takes state, produces new state, no hidden memory -**12-Factor AgentOps (VI: Lock Progress Forward)** +**12-Factor AgentOps (VIII: Lock Progress Forward)** - *Evolution*: Once work passes validation, it ratchets -- it cannot regress - *Why Different*: Without ratcheting, agents undo validated work during later iterations - *Key Practice*: Commit validated work to protected branches; checkpoint progress @@ -179,7 +209,7 @@ This compression does not replace the factors. It explains the mechanism beneath - *Why Changed*: Users interact through multiple interfaces - *Key Practice*: Meet users where they are -**12-Factor AgentOps (VII: Extract Learnings)** +**12-Factor AgentOps (IX: Extract Learnings)** - *Evolution*: Every session produces two outputs -- the work product and the lessons learned - *Why Different*: Without explicit extraction, hard-won knowledge dies with the session - *Key Practice*: End every session by capturing what worked, what failed, why it mattered, and where the learning came from @@ -199,7 +229,7 @@ This compression does not replace the factors. It explains the mechanism beneath - *Why Changed*: Scaling AI through composition, not monoliths - *Key Practice*: Parallelize via multiple agents -**12-Factor AgentOps (VIII: Compound Knowledge)** +**12-Factor AgentOps (X: Compound Knowledge)** - *Evolution*: Learnings must flow back into future sessions automatically - *Why Different*: Extraction without injection is a write-only journal nobody reads - *Key Practice*: Quality-gate extracted learnings, inject relevant knowledge at session start, measure retrieval effectiveness, let stale knowledge decay @@ -207,7 +237,7 @@ This compression does not replace the factors. It explains the mechanism beneath --- -### Factor IX: Disposability / Launch-Pause-Resume / Measure What Matters +### Factor IX: Disposability / Launch-Pause-Resume / Measure Outcomes **Original 12-Factor App (IX: Disposability)** - *Principle*: Fast startup, graceful shutdown @@ -219,7 +249,7 @@ This compression does not replace the factors. It explains the mechanism beneath - *Why Changed*: AI workflows need rapid start/stop - *Key Practice*: Same as Factor V -- agent lifecycle APIs -**12-Factor AgentOps (IX: Measure What Matters)** +**12-Factor AgentOps (XII: Measure Outcomes)** - *Evolution*: Track fitness toward goals, not activity metrics - *Why Different*: Without measurement, you cannot know if your operations are improving - *Key Practice*: Measure outcomes (validation pass rates, recurrence, knowledge reuse, cost per goal) not vanity metrics (tokens consumed, sessions run) @@ -238,7 +268,7 @@ This compression does not replace the factors. It explains the mechanism beneath - *Adaptation*: Not explicitly called out, but implied in all factors - *Note*: Incorporated into other factors -**12-Factor AgentOps (X: Isolate Workers)** +**12-Factor AgentOps (VI: Isolate Workers)** - *Evolution*: Each worker gets its own workspace, its own context, and zero shared mutable state - *Why Different*: Parallel agents sharing state create cascading conflicts - *Key Practice*: Git worktrees, separate context windows, independent validation @@ -266,7 +296,7 @@ This compression does not replace the factors. It explains the mechanism beneath --- -### Factor XII: Admin Processes / Contact Humans / Harvest Failures as Wisdom +### Factor XII: Admin Processes / Contact Humans / Compound Knowledge (folds in Harvest Failures) **Original 12-Factor App (XII: Admin Processes)** - *Principle*: Run admin/management tasks as one-off processes @@ -278,11 +308,12 @@ This compression does not replace the factors. It explains the mechanism beneath - *Why Changed*: AI needs human judgment for critical decisions - *Key Practice*: Human contact is a first-class operation, not exception -**12-Factor AgentOps (XII: Harvest Failures as Wisdom)** -- *Evolution*: Turn dead ends into routing hints that prune the next agent's search space +**12-Factor AgentOps (X: Compound Knowledge)** +- *Note*: In v4 the former standalone "Harvest Failures as Wisdom" factor was folded into **X: Compound Knowledge** -- failure harvesting is now one mechanism inside the knowledge flywheel, not a separate factor. +- *Evolution*: Turn dead ends into routing hints that prune the next agent's search space, alongside the positive learnings that compound across sessions - *Why Different*: Failures contain the highest-value learnings but are typically discarded - *Key Practice*: Index negative knowledge for retrieval at decision time, and hand a stuck worker's failure trace to a fresh agent rather than looping the saturated one -- *Unique Aspect*: Scale tier (factory altitude) -- negative knowledge that prunes the search, distinct from Factor VII's generic capture; feeds Factor VIII (Compound Knowledge) +- *Unique Aspect*: Negative knowledge that prunes the search space, now part of the same compounding loop as Factor IX (Extract Learnings) feeding Factor X (Compound Knowledge) --- diff --git a/docs/principles/constraint-based-engineering.md b/docs/principles/constraint-based-engineering.md index fc956ba..eee6ffb 100644 --- a/docs/principles/constraint-based-engineering.md +++ b/docs/principles/constraint-based-engineering.md @@ -128,8 +128,8 @@ AI systems operate under hard constraints that cannot be "fixed," only optimized **Examples:** - Context optimization → Factor I (Context Is Everything) -- Air-gap patterns → Factor X (Isolate Workers) -- Cost optimization → Factor IX (Measure What Matters) +- Air-gap patterns → Factor VI (Isolate Workers) +- Cost optimization → Factor XII (Measure Outcomes) - Latency requirements → Factor III (One Agent, One Job) **Key:** Successful constraint solutions become reusable factors. @@ -154,7 +154,7 @@ Each pillar represents a **class of constraints** and proven architectural respo - Observability as first-class concern (metrics, logs, traces) - Reliability engineering patterns (circuit breakers, retries, fallbacks) -**Maps to Factors:** II (Track Everything in Git), V (Validate Externally), IX (Measure What Matters), XII (Harvest Failures as Wisdom) +**Maps to Factors:** II (Track Everything in Git), VII (Validate Externally), XII (Measure Outcomes), XII (Compound Knowledge) ### Pillar 2: Learning Science @@ -170,7 +170,7 @@ Each pillar represents a **class of constraints** and proven architectural respo - Pattern extraction (learn from experience, don't repeat) - Phase-based workflows (research → plan → implement) -**Maps to Factors:** III (One Agent, One Job), IV (Research Before You Build), VII (Extract Learnings), VIII (Compound Knowledge) +**Maps to Factors:** III (One Agent, One Job), V (Research Before You Build), IX (Extract Learnings), X (Compound Knowledge) ### Pillar 3: Context Engineering @@ -186,7 +186,7 @@ Each pillar represents a **class of constraints** and proven architectural respo - Progressive disclosure (bootstrap → workflow → details) - Sub-agent isolation (separate contexts, don't pollute) -**Maps to Factors:** I (Context Is Everything), III (One Agent, One Job), VI (Lock Progress Forward) +**Maps to Factors:** I (Context Is Everything), III (One Agent, One Job), VIII (Lock Progress Forward) ### Pillar 4: Knowledge OS @@ -202,7 +202,7 @@ Each pillar represents a **class of constraints** and proven architectural respo - History as audit trail (why decisions were made) - Patterns compound over time (organizational learning) -**Maps to Factors:** II (Track Everything in Git), VII (Extract Learnings), VIII (Compound Knowledge) +**Maps to Factors:** II (Track Everything in Git), IX (Extract Learnings), X (Compound Knowledge) --- @@ -215,15 +215,15 @@ Each factor is a **specific constraint-optimization pattern:** | **I. Context Is Everything** | 200k token context window | JIT loading, <40% utilization | | **II. Track Everything in Git** | Human memory limitations | External memory via version control | | **III. One Agent, One Job** | Cognitive load per agent | Single-responsibility, composition | -| **IV. Research Before You Build** | Premature implementation risk | Research-plan-implement workflow | -| **V. Validate Externally** | Probabilistic AI outputs | External validation gates, zero-trust | -| **VI. Lock Progress Forward** | Multi-day work constraints | State persistence, session continuity | -| **VII. Extract Learnings** | Institutional learning rate | Pattern extraction from every session | -| **VIII. Compound Knowledge** | Knowledge decay over time | HERO cycle, compounding institutional memory | -| **IX. Measure What Matters** | System observability limits | Targeted telemetry, actionable metrics | -| **X. Isolate Workers** | Cross-contamination risk | Worker isolation, independent worktrees | +| **V. Research Before You Build** | Premature implementation risk | Research-plan-implement workflow | +| **VII. Validate Externally** | Probabilistic AI outputs | External validation gates, zero-trust | +| **VIII. Lock Progress Forward** | Multi-day work constraints | State persistence, session continuity | +| **IX. Extract Learnings** | Institutional learning rate | Pattern extraction from every session | +| **X. Compound Knowledge** | Knowledge decay over time | HERO cycle, compounding institutional memory | +| **XII. Measure Outcomes** | System observability limits | Targeted telemetry, actionable metrics | +| **VI. Isolate Workers** | Cross-contamination risk | Worker isolation, independent worktrees | | **XI. Supervise Hierarchically** | Coordination overhead at scale | Hierarchical supervision, escalation paths | -| **XII. Harvest Failures as Wisdom** | Repeated failure prevention | Failure analysis, pattern extraction from errors | +| **XII. Compound Knowledge** | Repeated failure prevention | Failure analysis, pattern extraction from errors | **The pattern:** Constraint → Factor (specific solution) → Pillar (solution class) @@ -291,10 +291,10 @@ Each factor is a **specific constraint-optimization pattern:** - Context optimization: <40% rule reduces token costs **Quality Constraint → Validation Infrastructure:** -- External validation checks (Factor V: Validate Externally) +- External validation checks (Factor VII: Validate Externally) - Hierarchical review for Tier 3 (Factor XI: Supervise Hierarchically) -- Pattern learning improves operations (Factor VII: Extract Learnings) -- Targeted measurement (Factor IX: Measure What Matters) +- Pattern learning improves operations (Factor IX: Extract Learnings) +- Targeted measurement (Factor XII: Measure Outcomes) **Result:** 10x user growth within budget through constraint-optimized routing. @@ -479,26 +479,26 @@ Understanding the full landscape of constraints helps identify which factors app **High context constraints (limited tokens):** - Factor I: Context Is Everything (primary) - Factor III: One Agent, One Job (supporting) -- Factor VI: Lock Progress Forward (supporting) +- Factor VIII: Lock Progress Forward (supporting) **High reliability constraints (zero tolerance):** -- Factor V: Validate Externally (primary) +- Factor VII: Validate Externally (primary) - Factor XI: Supervise Hierarchically (primary) -- Factor XII: Harvest Failures as Wisdom (supporting) +- Factor XII: Compound Knowledge (supporting) **High cost constraints (budget limited):** -- Factor IX: Measure What Matters (primary) +- Factor XII: Measure Outcomes (primary) - Factor III: One Agent, One Job (supporting) -- Factor X: Isolate Workers (supporting) +- Factor VI: Isolate Workers (supporting) **High scale constraints (growth expected):** -- Factor IX: Measure What Matters (primary) -- Factor X: Isolate Workers (primary) -- Factor VII: Extract Learnings (supporting) +- Factor XII: Measure Outcomes (primary) +- Factor VI: Isolate Workers (primary) +- Factor IX: Extract Learnings (supporting) **High security constraints (classified/regulated):** - Factor II: Track Everything in Git (primary) -- Factor V: Validate Externally (primary) +- Factor VII: Validate Externally (primary) - Factor XI: Supervise Hierarchically (supporting) --- @@ -577,7 +577,7 @@ Apply this to every constraint you encounter. The solutions become your factors. **Factors:** - [All 12 Factors](../../factors/) - Specific constraint-optimization patterns - [Factor I: Context Is Everything](../../factors/01-context-is-everything.md) - Context window constraints -- [Factor VII: Extract Learnings](../../factors/07-extract-learnings.md) - Pattern extraction +- [Factor IX: Extract Learnings](../../factors/09-extract-learnings.md) - Pattern extraction **Application:** - [Workflow Guide](../tutorials/workflow-guide.md) - Applying constraint-based thinking diff --git a/docs/principles/context-engineering.md b/docs/principles/context-engineering.md index 32080bf..b2d71af 100644 --- a/docs/principles/context-engineering.md +++ b/docs/principles/context-engineering.md @@ -633,7 +633,7 @@ Never exceed capacity in any session Context engineering directly supports several factors in the 12-Factor framework: - **I. Context Is Everything** - The 40% rule is the foundation of this factor - **III. One Agent, One Job** - Focused agents prevent context pollution -- **VI. Lock Progress Forward** - Session boundaries and state persistence prevent context loss +- **VIII. Lock Progress Forward** - Session boundaries and state persistence prevent context loss ### Relationship to Learning Science (Pillar 2) diff --git a/docs/principles/evolution-of-12-factor.md b/docs/principles/evolution-of-12-factor.md index 0928fdd..8156df9 100644 --- a/docs/principles/evolution-of-12-factor.md +++ b/docs/principles/evolution-of-12-factor.md @@ -2,7 +2,7 @@ ## Overview -The original 12-Factor App methodology (2011) transformed how we build cloud-native applications. As AI agents become critical infrastructure, two parallel adaptations have extended these proven principles into the AI age. 12-Factor AgentOps v3 represents the latest stage: a full operational discipline for working with AI agents. +The original 12-Factor App methodology (2011) transformed how we build cloud-native applications. As AI agents become critical infrastructure, two parallel adaptations have extended these proven principles into the AI age. 12-Factor AgentOps v4 represents the latest stage: a full operational discipline for working with AI agents. ### Standing on the Shoulders of Giants @@ -35,7 +35,7 @@ Stage 2: 12-Factor Agents (Dex Horthy, 2025) Solution: Principled LLM application architecture | v -Stage 3: 12-Factor AgentOps v3 (Burkhart, 2026) +Stage 3: 12-Factor AgentOps v4 (Burkhart, 2026) Problem: How to operate with AI agents reliably Solution: Operational discipline with knowledge compounding ``` @@ -44,7 +44,7 @@ Stage 3: 12-Factor AgentOps v3 (Burkhart, 2026) - **[12-Factor App](https://12factor.net)** (Heroku, 2011): Foundation -- how to build cloud-native applications - **[12-Factor Agents](https://github.com/humanlayer/12-factor-agents)** (Dex Horthy, 2025): Application layer -- how to build reliable AI applications -- **12-Factor AgentOps** (This Framework, v3 2026): Operations layer -- the operational discipline for working with AI agents +- **12-Factor AgentOps** (This Framework, v4 2026): Operations layer -- the operational discipline for working with AI agents **They are complementary, not competitive.** Each addresses a different layer of the stack. @@ -68,15 +68,17 @@ The operational gap manifests as: 12-Factor AgentOps fills this gap with an operational discipline organized around one insight: **knowledge compounds**. -### What Changed in v3 +### What Changed Across v3 and v4 -v3 restructured the factors around operational reality rather than theoretical taxonomy: +v3 restructured the factors around operational reality rather than theoretical taxonomy. **v4** refined the set itself: it added **IV: Enforce Least Privilege** and folded the former standalone "Harvest Failures as Wisdom" factor into **X: Compound Knowledge**, then renumbered the factors so the count stayed at twelve. -| Aspect | Pre-v3 | v3 | +| Aspect | Pre-v3 | v3 / v4 | |--------|--------|-----| | **Organization** | Flat list of 12 | Four tiers: Foundation, Workflow, Knowledge, Scale | | **Adoption model** | All-or-nothing manifesto | Progressive -- stop at any tier, keep the value | -| **Hero concept** | Distributed | Factor VIII (Compound Knowledge) is the differentiator | +| **Hero concept** | Distributed | Factor X (Compound Knowledge) is the differentiator | +| **Security** | Implicit in validation | v4 promotes least privilege to its own factor (IV) | +| **Failure harvesting** | Standalone factor (XII) | v4 folds it into Compound Knowledge (X) | | **Scale factors** | Required | Factory altitude -- lived small solo, structural at fleet scale (never skipped) | | **Framing** | Framework for AI infrastructure | Operational discipline for working with agents | | **Entry point** | Read the theory first | Start with a `learnings.md` file and zero tooling | @@ -85,22 +87,29 @@ v3 restructured the factors around operational reality rather than theoretical t ## The Complete Mapping -### How Each Original Factor Evolved - -| # | Original (2011) | Agents (2025) | AgentOps v3 (2026) | Tier | -|---|-----------------|---------------|---------------------|------| -| **I** | Codebase | Own your prompts | **[Context Is Everything](../../factors/01-context-is-everything.md)** | Foundation | -| **II** | Dependencies | Own your context window | **[Track Everything in Git](../../factors/02-track-everything-in-git.md)** | Foundation | -| **III** | Config | Tools as structured outputs | **[One Agent, One Job](../../factors/03-one-agent-one-job.md)** | Foundation | -| **IV** | Backing Services | Small, focused agents | **[Research Before You Build](../../factors/04-research-before-you-build.md)** | Workflow | -| **V** | Build/Release/Run | Launch/Pause/Resume APIs | **[Validate Externally](../../factors/05-validate-externally.md)** | Workflow | -| **VI** | Processes | Stateless reducer | **[Lock Progress Forward](../../factors/06-lock-progress-forward.md)** | Workflow | -| **VII** | Port Binding | Trigger from anywhere | **[Extract Learnings](../../factors/07-extract-learnings.md)** | Knowledge | -| **VIII** | Concurrency | Small, focused agents | **[Compound Knowledge](../../factors/08-compound-knowledge.md)** | Knowledge | -| **IX** | Disposability | Launch/Pause/Resume | **[Measure What Matters](../../factors/09-measure-what-matters.md)** | Knowledge | -| **X** | Dev/Prod Parity | Implicit | **[Isolate Workers](../../factors/10-isolate-workers.md)** | Scale | -| **XI** | Logs | Compact errors into context | **[Supervise Hierarchically](../../factors/11-supervise-hierarchically.md)** | Scale | -| **XII** | Admin Processes | Contact humans with tools | **[Harvest Failures as Wisdom](../../factors/12-harvest-failures-as-wisdom.md)** | Scale | +### How Each AgentOps Factor Traces Back + +This table is ordered by the **AgentOps v4** numeral and lists each factor's *closest +ancestor* in the older frameworks. After the v4 renumbering (which added **IV: Enforce +Least Privilege** and folded "Harvest Failures as Wisdom" into **X: Compound +Knowledge**), the AgentOps numerals no longer line up with Heroku's -- the mapping is a +**conceptual lineage, not a 1:1 position match**. For example, AgentOps IX (Extract +Learnings) descends from Heroku VII (Port Binding), not Heroku IX (Disposability). + +| AgentOps # | AgentOps v4 (2026) | Closest Heroku ancestor (2011) | Closest Agents ancestor (2025) | Tier | +|---|---------------------|--------------------------------|--------------------------------|------| +| **I** | **[Context Is Everything](../../factors/01-context-is-everything.md)** | I Codebase | Own your prompts | Foundation | +| **II** | **[Track Everything in Git](../../factors/02-track-everything-in-git.md)** | II Dependencies | Own your context window | Foundation | +| **III** | **[One Agent, One Job](../../factors/03-one-agent-one-job.md)** | III Config | Tools as structured outputs | Foundation | +| **IV** | **[Enforce Least Privilege](../../factors/04-enforce-least-privilege.md)** | III Config / Zero-trust lineage | (new -- no direct ancestor) | Foundation | +| **V** | **[Research Before You Build](../../factors/05-research-before-you-build.md)** | IV Backing Services | Small, focused agents | Workflow | +| **VI** | **[Isolate Workers](../../factors/06-isolate-workers.md)** | X Dev/Prod Parity | Implicit | Workflow | +| **VII** | **[Validate Externally](../../factors/07-validate-externally.md)** | V Build/Release/Run | Launch/Pause/Resume APIs | Workflow | +| **VIII** | **[Lock Progress Forward](../../factors/08-lock-progress-forward.md)** | VI Processes | Stateless reducer | Workflow | +| **IX** | **[Extract Learnings](../../factors/09-extract-learnings.md)** | VII Port Binding | Trigger from anywhere | Knowledge | +| **X** | **[Compound Knowledge](../../factors/10-compound-knowledge.md)** | VIII Concurrency + XII Admin Processes | Small, focused agents; Contact humans with tools | Knowledge | +| **XI** | **[Supervise Hierarchically](../../factors/11-supervise-hierarchically.md)** | XI Logs | Compact errors into context | Scale | +| **XII** | **[Measure Outcomes](../../factors/12-measure-outcomes.md)** | IX Disposability | Launch/Pause/Resume | Knowledge | **See also:** [Comparison Table](./comparison-table.md) for detailed factor-by-factor analysis. @@ -137,18 +146,19 @@ AI operations broke every assumption: - Human-in-the-loop as first-class pattern - Error compaction into learning context -**12-Factor AgentOps v3** added: -- **Knowledge compounding** -- the flywheel that makes each session smarter (Factors VII, VIII) -- **External validation** -- the worker reports evidence; an independent checker writes the binding verdict (Factor V) -- **Progress ratcheting** -- validated work cannot regress (Factor VI) -- **Research-first workflow** -- understand before generating (Factor IV) -- **Outcome measurement** -- track what matters, not activity (Factor IX) -- **Fitness gradient** -- define better versus worse states through goals, metrics, and gates (Factor IX) -- **Provenance-backed learning** -- know where a learning came from before trusting or promoting it (Factors II, VII) -- **Failure harvesting** -- dead ends become routing hints that prune the next agent's search (Factor XII) +**12-Factor AgentOps v4** added: +- **Knowledge compounding** -- the flywheel that makes each session smarter (Factors IX, X) +- **Least privilege** -- scope each agent's permissions and blast radius to its task (Factor IV) +- **External validation** -- the worker reports evidence; an independent checker writes the binding verdict (Factor VII) +- **Progress ratcheting** -- validated work cannot regress (Factor VIII) +- **Research-first workflow** -- understand before generating (Factor V) +- **Outcome measurement** -- track what matters, not activity (Factor XII) +- **Fitness gradient** -- define better versus worse states through goals, metrics, and gates (Factor XII) +- **Provenance-backed learning** -- know where a learning came from before trusting or promoting it (Factors II, IX) +- **Failure harvesting** -- dead ends become routing hints that prune the next agent's search (now folded into Factor X) - **Tiered adoption** -- start with zero tooling, scale when needed -### The Compression Beneath v3 +### The Compression Beneath v4 As the doctrine matured, the factors became easier to compress into one operating picture: @@ -166,13 +176,13 @@ As the doctrine matured, the factors became easier to compress into one operatin The defining contribution of 12-Factor AgentOps is the knowledge flywheel -- a system where operational knowledge compounds automatically across sessions. ``` - Extract (Factor VII) + Extract (Factor IX) | v - Session --> Learnings --> Quality Gate (Factor V) + Session --> Learnings --> Quality Gate (Factor VII) | v - Measure (Factor IX) <-- Knowledge Base --> Inject (Factor I) + Measure (Factor XII) <-- Knowledge Base --> Inject (Factor I) | | v v Decay / Prune Next Session (smarter) @@ -180,29 +190,29 @@ The defining contribution of 12-Factor AgentOps is the knowledge flywheel -- a s **Why this matters:** Better models do not replace institutional memory. A frontier model with amnesia still repeats your mistakes. A weaker model with your documented patterns, pitfalls, and conventions will outperform it in your specific context. -This is the one thing no amount of model improvement commoditizes. It is the HERO of the framework (Factor VIII: Compound Knowledge). +This is the one thing no amount of model improvement commoditizes. It is the HERO of the framework (Factor X: Compound Knowledge). --- ## The Four Tiers -v3 organizes the 12 factors into progressive tiers. Each tier builds on the previous one. You can stop at any tier and keep the value. +v4 organizes the 12 factors into progressive tiers. Each tier builds on the previous one. You can stop at any tier and keep the value. (The factors are numbered to keep stable links; tier membership, not the numeral, is what groups them.) -### Foundation (I-III): Non-negotiable basics +### Foundation (I-IV): Non-negotiable basics -Context discipline, git tracking, scoped sessions. Works with zero tooling. Get these wrong and nothing else matters. +Context discipline, git tracking, scoped sessions, least privilege. Works with zero tooling. Get these wrong and nothing else matters. -### Workflow (IV-VI): The operating discipline +### Workflow (V-VIII): The operating discipline -Research before building. Validate externally. Lock progress forward. The discipline that separates "prompting and hoping" from a reliable operating model. +Research before building. Isolate workers. Validate externally. Lock progress forward. The discipline that separates "prompting and hoping" from a reliable operating model. -### Knowledge (VII-IX): Where compounding kicks in +### Knowledge (IX, X, XII): Where compounding kicks in -Extract learnings. Compound knowledge. Measure outcomes. This is where sessions start getting measurably smarter over time. +Extract learnings. Compound knowledge (which now also harvests failures as wisdom). Measure outcomes. This is where sessions start getting measurably smarter over time. -### Scale (X-XII): The factory altitude +### Scale (XI): The factory altitude -Isolate workers. Supervise hierarchically. Harvest failures. These are the same factors at fleet scale -- you grow into the altitude, you don't skip the factors. Working solo you live them in miniature: a worktree is isolation, your own judgment is the supervisor, your `learnings.md` is failure-harvesting. The machinery becomes structural when one head can no longer hold the whole thing. +Supervise hierarchically -- and at fleet scale, lean on Isolate Workers (VI) and the failure-harvesting now inside Compound Knowledge (X). These are the same factors at fleet scale -- you grow into the altitude, you don't skip the factors. Working solo you live them in miniature: a worktree is isolation, your own judgment is the supervisor, your `learnings.md` is failure-harvesting. The machinery becomes structural when one head can no longer hold the whole thing. --- @@ -210,23 +220,24 @@ Isolate workers. Supervise hierarchically. Harvest failures. These are the same ### From Zero-Trust to Operational Discipline -v3 reframes the core insight. The original framing was "zero-trust cognitive infrastructure" -- treat AI output like untrusted network traffic. That framing is technically accurate but misses the broader point. +v4 carries forward the v3 reframe of the core insight. The original framing was "zero-trust cognitive infrastructure" -- treat AI output like untrusted network traffic. That framing is technically accurate but misses the broader point. -The v3 framing: **operational discipline for working with AI agents.** The same way DevOps transformed ad-hoc deployment into a reliable practice, 12-Factor AgentOps transforms ad-hoc agent usage into a reliable, compounding practice. +The current framing: **operational discipline for working with AI agents.** The same way DevOps transformed ad-hoc deployment into a reliable practice, 12-Factor AgentOps transforms ad-hoc agent usage into a reliable, compounding practice. -The zero-trust principle survives as Factor V (Validate Externally): the worker reports evidence, an independent checker writes the binding verdict, and no agent grades its own work. But the framework is bigger than validation. It is about: +The zero-trust principle survives in two factors: Factor IV (Enforce Least Privilege) bounds what an agent *can do*, and Factor VII (Validate Externally) bounds what an agent's claims *are worth* -- the worker reports evidence, an independent checker writes the binding verdict, and no agent grades its own work. But the framework is bigger than validation. It is about: 1. **Managing context** so agents get good input (Factor I) 2. **Persisting knowledge** so nothing is lost between sessions (Factor II) 3. **Scoping work** so agents operate in their effective range (Factor III) -4. **Understanding before building** so agents solve the right problem (Factor IV) -5. **Validating externally** so quality is objective -- claims from the worker, the binding verdict from an independent checker (Factor V) -6. **Ratcheting progress** so validated work is protected (Factor VI) -7. **Extracting learnings** so every session produces knowledge (Factor VII) -8. **Compounding knowledge** so each session is smarter than the last (Factor VIII) -9. **Measuring outcomes** so improvement is demonstrable (Factor IX) - -And at scale: isolating workers (X), supervising hierarchically (XI), and harvesting failures as wisdom (XII). +4. **Enforcing least privilege** so a bad inference has a bounded blast radius (Factor IV) +5. **Understanding before building** so agents solve the right problem (Factor V) +6. **Validating externally** so quality is objective -- claims from the worker, the binding verdict from an independent checker (Factor VII) +7. **Ratcheting progress** so validated work is protected (Factor VIII) +8. **Extracting learnings** so every session produces knowledge (Factor IX) +9. **Compounding knowledge** -- including harvesting failures as wisdom -- so each session is smarter than the last (Factor X) +10. **Measuring outcomes** so improvement is demonstrable (Factor XII) + +And at scale: isolating workers (VI) and supervising hierarchically (XI). The latest refinement is that these practices can now be compressed more cleanly into one operator picture: fitness gradient, stateful environment, replaceable actors, durable traces, selection gates, promotion loops, and governance. diff --git a/docs/reference/README.md b/docs/reference/README.md index fd9aaff..6bd548a 100644 --- a/docs/reference/README.md +++ b/docs/reference/README.md @@ -6,7 +6,7 @@ ## The 12 Factors -### Foundation (I-III) +### Prepare (I-III) | # | Factor | Purpose | |---|--------|---------| @@ -14,29 +14,29 @@ | **II** | [Track Everything in Git](../../factors/02-track-everything-in-git.md) | Git as institutional memory for decisions, patterns, and history | | **III** | [One Agent, One Job](../../factors/03-one-agent-one-job.md) | Each agent gets a single, well-scoped task | -### Workflow (IV-VI) +### Bound (IV-VI) | # | Factor | Purpose | |---|--------|---------| -| **IV** | [Research Before You Build](../../factors/04-research-before-you-build.md) | Understand the problem space before writing code | -| **V** | [Validate Externally](../../factors/05-validate-externally.md) | Automated checks that catch errors the agent cannot see | -| **VI** | [Lock Progress Forward](../../factors/06-lock-progress-forward.md) | Commit incrementally so work is never lost | +| **IV** | [Enforce Least Privilege](../../factors/04-enforce-least-privilege.md) | An agent acts inside a least-privilege envelope it cannot widen -- not even on untrusted input | +| **V** | [Research Before You Build](../../factors/05-research-before-you-build.md) | Understand the problem space before writing code | +| **VI** | [Isolate Workers](../../factors/06-isolate-workers.md) | Each agent gets its own worktree and environment | -### Knowledge (VII-IX) +### Select (VII-IX) | # | Factor | Purpose | |---|--------|---------| -| **VII** | [Extract Learnings](../../factors/07-extract-learnings.md) | Turn session outcomes into reusable knowledge | -| **VIII** | [Compound Knowledge](../../factors/08-compound-knowledge.md) | HERO pattern: knowledge grows across sessions | -| **IX** | [Measure What Matters](../../factors/09-measure-what-matters.md) | Track the metrics that drive improvement | +| **VII** | [Validate Externally](../../factors/07-validate-externally.md) | Automated checks that catch errors the agent cannot see | +| **VIII** | [Lock Progress Forward](../../factors/08-lock-progress-forward.md) | Commit incrementally so work is never lost | +| **IX** | [Extract Learnings](../../factors/09-extract-learnings.md) | Turn session outcomes into reusable knowledge | -### Scale (X-XII) — the factory altitude +### Govern (X-XII) — the factory altitude | # | Factor | Purpose | |---|--------|---------| -| **X** | [Isolate Workers](../../factors/10-isolate-workers.md) | Each agent gets its own worktree and environment | +| **X** | [Compound Knowledge](../../factors/10-compound-knowledge.md) | HERO pattern: knowledge grows across sessions; failures become documented prevention patterns | | **XI** | [Supervise Hierarchically](../../factors/11-supervise-hierarchically.md) | Supervisors manage agent fleets, not humans directly | -| **XII** | [Harvest Failures as Wisdom](../../factors/12-harvest-failures-as-wisdom.md) | Failures become documented prevention patterns | +| **XII** | [Measure Outcomes](../../factors/12-measure-outcomes.md) | Track the metrics that drive improvement | --- @@ -44,13 +44,13 @@ The 12 factors are organized into four tiers of increasing sophistication: -**Foundation (I-III)** -- Get these right first. Context management, git discipline, and focused agents form the base that everything else builds on. +**Prepare (I-III)** -- Get these right first. Context management, git discipline, and focused agents form the base that everything else builds on. -**Workflow (IV-VI)** -- The operational loop. Research before building, validate with external tools, and lock progress forward through incremental commits. +**Bound (IV-VI)** -- Bound the work before agents run. Enforce least privilege, research before building, and isolate workers so they cannot collide. -**Knowledge (VII-IX)** -- The compounding engine. Extract learnings, compound them across sessions, and measure the metrics that actually matter. +**Select (VII-IX)** -- Select the work that holds. Validate with external tools, lock progress forward through incremental commits, and extract learnings from every session. -**Scale (X-XII)** -- The factory altitude: the same factors at fleet scale. Solo, you live them in miniature (a worktree is isolation, your judgment is supervision); running multiple agents, they become structural — worker isolation, hierarchical supervision, and systematic failure harvesting. You grow into the altitude, you don't skip the factors. +**Govern (X-XII)** -- The factory altitude: the same factors at fleet scale. Solo, you live them in miniature (a worktree is isolation, your judgment is supervision); running multiple agents, they become structural — knowledge compounding, hierarchical supervision, and outcome measurement. You grow into the altitude, you don't skip the factors. --- @@ -78,7 +78,7 @@ Example: Database deployment 4 hours -> 90 seconds = 27x ## Common Commands -### Validation (Factor V) +### Validation (Factor VII) ```bash make quick # 5s syntax check make test # 30s unit tests @@ -94,7 +94,7 @@ git commit # Commit template captures decisions git log # Review history for patterns ``` -### Session Management (Factor VI) +### Session Management (Factor VIII) ```bash ls .sessions/ # List sessions cat .sessions/[date].md # Load session context @@ -107,10 +107,10 @@ cat .sessions/[date].md # Load session context ``` project/ ├── CLAUDE.md # Context file (Factor I) -├── Makefile # Validation gates (Factor V) -├── learnings.md # Extracted knowledge (Factor VII) +├── Makefile # Validation gates (Factor VII) +├── learnings.md # Extracted knowledge (Factor IX) ├── .gitmessage # Commit template (Factor II) -├── .sessions/ # Session notes (Factor VI) +├── .sessions/ # Session notes (Factor VIII) │ └── YYYY-MM-DD-[task].md └── src/ # Your code ``` @@ -121,10 +121,10 @@ project/ | Problem | Likely Cause | Factor to Review | |---------|--------------|------------------| -| Low success rate (<70%) | Context overload or missing validation | I: Context Is Everything, V: Validate Externally | -| Agent generates wrong code | Unclear scope or missing research | III: One Agent One Job, IV: Research Before You Build | -| Same mistakes repeated | No learning extraction | VII: Extract Learnings, XII: Harvest Failures as Wisdom | -| Can't resume work | Missing session notes | VI: Lock Progress Forward | +| Low success rate (<70%) | Context overload or missing validation | I: Context Is Everything, VII: Validate Externally | +| Agent generates wrong code | Unclear scope or missing research | III: One Agent One Job, V: Research Before You Build | +| Same mistakes repeated | No learning extraction | IX: Extract Learnings, X: Compound Knowledge | +| Can't resume work | Missing session notes | VIII: Lock Progress Forward | | Validation takes too long | Over-scoped checks | Start with `make quick` only | --- diff --git a/docs/reference/anthropic-long-running-agents.md b/docs/reference/anthropic-long-running-agents.md index cf06345..813f520 100644 --- a/docs/reference/anthropic-long-running-agents.md +++ b/docs/reference/anthropic-long-running-agents.md @@ -2,7 +2,7 @@ **Source:** [Effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents) - Anthropic Engineering, November 2025 -**Relation to 12-Factor AgentOps:** This pattern directly implements [Factor VI: Lock Progress Forward](../../factors/06-lock-progress-forward.md) and validates our git-based memory approach. +**Relation to 12-Factor AgentOps:** This pattern directly implements [Factor VIII: Lock Progress Forward](../../factors/08-lock-progress-forward.md) and validates our git-based memory approach. --- diff --git a/docs/reference/failure-patterns.md b/docs/reference/failure-patterns.md index c2dc33d..6d75f98 100644 --- a/docs/reference/failure-patterns.md +++ b/docs/reference/failure-patterns.md @@ -46,7 +46,7 @@ - Confabulation of success based on code structure, not execution - No actual test harness invoked -**Violated Factor:** **Factor V (Validate Externally)** +**Violated Factor:** **Factor VII (Validate Externally)** **Prevention:** - Always run tests independently (don't trust AI claims) @@ -155,7 +155,7 @@ AI: "What caching solution should we use?" - Guessing rather than analyzing - Each iteration adds more logging, no fix -**Violated Factor:** **Factor IV (Research Before You Build)** +**Violated Factor:** **Factor V (Research Before You Build)** **Prevention:** - Use debugger first (breakpoint at error location) @@ -214,7 +214,7 @@ Error: Still occurs - AI loses track of change intent partway through - Result: Incomplete refactor, broken state -**Violated Factor:** **Factor X (Isolate Workers)** + **Factor I (Context Is Everything)** +**Violated Factor:** **Factor VI (Isolate Workers)** + **Factor I (Context Is Everything)** **Prevention:** - Keep files <500 lines (modularity constraint) @@ -255,7 +255,7 @@ If file too large: - Each fix breaks something else - No holistic understanding of codebase -**Violated Factor:** **Factor V (Validate Externally)** + **Factor X (Isolate Workers)** +**Violated Factor:** **Factor VII (Validate Externally)** + **Factor VI (Isolate Workers)** **Prevention:** - Validate code structure before committing @@ -297,7 +297,7 @@ After AI edit: - No modularity constraints enforced - Extended session without architectural oversight -**Violated Factor:** **Factor X (Isolate Workers)** +**Violated Factor:** **Factor VI (Isolate Workers)** **Prevention:** - Set explicit modularity constraints upfront: @@ -358,7 +358,7 @@ def process_everything(data, config, db, cache, logger, metrics, ...): - Agents overlap in scope - No explicit handoff protocols -**Violated Factor:** **Factor III (One Agent, One Job)** + **Factor IV (Research Before You Build)** +**Violated Factor:** **Factor III (One Agent, One Job)** + **Factor V (Research Before You Build)** **Prevention:** - Assign agents to specific domains (Agent A = frontend, B = backend, C = DB) @@ -411,7 +411,7 @@ Git: CONFLICT (content): Merge conflict in src/api/routes.py - Circular dependencies (A needs B, B needs A) - No tracer bullet to break cycle -**Violated Factor:** **Factor X (Isolate Workers)** + poor task decomposition +**Violated Factor:** **Factor VI (Isolate Workers)** + poor task decomposition **Prevention:** - Implement tracer bullet first (vertical slice end-to-end) @@ -484,7 +484,7 @@ Before parallel agents: - No backward compatibility validation - Missing contract testing -**Violated Factor:** **Factor V (Validate Externally)** + **Factor XI (Supervise Hierarchically)** +**Violated Factor:** **Factor VII (Validate Externally)** + **Factor XI (Supervise Hierarchically)** **Prevention:** - API compatibility tests in CI/CD pipeline @@ -593,7 +593,7 @@ Git hook: - Every change requires committee approval - Manual review slower than AI generation -**Violated Factor:** **Factor V (Validate Externally)** + **Factor IX (Measure What Matters)** +**Violated Factor:** **Factor VII (Validate Externally)** + **Factor XII (Measure Outcomes)** **Prevention:** - Implement fast lane for low-risk changes @@ -640,7 +640,7 @@ Else: - Changes deployed directly to production - Missing integration tests -**Violated Factor:** **Factor V (Validate Externally)** + **Factor XI (Supervise Hierarchically)** +**Violated Factor:** **Factor VII (Validate Externally)** + **Factor XI (Supervise Hierarchically)** **Prevention:** - Staging environment matching production @@ -676,15 +676,15 @@ Deployment flow: | Symptom | Pattern | Loop | Violated Factors | Page | |---------|---------|------|-----------------|------| -| AI claims tests pass, code broken | "Tests Passing" Lie | Inner | V (Validate Externally) | ↑ | +| AI claims tests pass, code broken | "Tests Passing" Lie | Inner | VII (Validate Externally) | ↑ | | AI forgets recent instructions | Context Amnesia | Inner | I (Context Is Everything) | ↑ | -| AI adds logging instead of fixing bug | Debug Loop Spiral | Inner | IV (Research Before You Build) | ↑ | -| 3,000-line unmaintainable function | Eldritch Code Horror | Middle | X (Isolate Workers) | ↑ | +| AI adds logging instead of fixing bug | Debug Loop Spiral | Inner | V (Research Before You Build) | ↑ | +| 3,000-line unmaintainable function | Eldritch Code Horror | Middle | VI (Isolate Workers) | ↑ | | Multiple agents modify same file | Agent Workspace Collision | Middle | III (One Agent, One Job) | ↑ | -| Agents waiting for each other | Multi-Agent Deadlock | Middle | X (Isolate Workers) | ↑ | -| Production API breaks after deployment | Bridge Torching | Outer | V (Validate Externally) | ↑ | +| Agents waiting for each other | Multi-Agent Deadlock | Middle | VI (Isolate Workers) | ↑ | +| Production API breaks after deployment | Bridge Torching | Outer | VII (Validate Externally) | ↑ | | Git branch with work deleted | Repository Deletion | Outer | II (Track Everything in Git) | ↑ | -| AI code waits weeks for approval | Process Gridlock | Outer | IX (Measure What Matters) | ↑ | +| AI code waits weeks for approval | Process Gridlock | Outer | XII (Measure Outcomes) | ↑ | | Production deployment breaks system | Cascading Failures | Outer | V, XI (Validate, Supervise) | ↑ | --- @@ -704,12 +704,12 @@ Deployment flow: ## Prevention Hierarchy **Best (Factor-based design):** -- Factor X (Isolate Workers) prevents eldritch horrors before they form +- Factor VI (Isolate Workers) prevents eldritch horrors before they form - Factor III (One Agent, One Job) prevents workspace collisions with domain boundaries **Good (Automated detection):** -- Factor V (Validate Externally) catches "tests passing" lies immediately -- Factor VI (Lock Progress Forward) catches regressions before they compound +- Factor VII (Validate Externally) catches "tests passing" lies immediately +- Factor VIII (Lock Progress Forward) catches regressions before they compound **Acceptable (Human review):** - Factor XI (Supervise Hierarchically) identifies issues through oversight @@ -746,7 +746,7 @@ Deployment flow: **After any failure:** 1. **Find the pattern** in this catalog -2. **Run blameless postmortem** (Factor VII: Extract Learnings) +2. **Run blameless postmortem** (Factor IX: Extract Learnings) 3. **Add to institutional memory** (document in team runbook) 4. **Improve factor implementation** (strengthen prevention) diff --git a/docs/reference/jobspec-openapi-v0-rfc.md b/docs/reference/jobspec-openapi-v0-rfc.md index cfcbc87..102930d 100644 --- a/docs/reference/jobspec-openapi-v0-rfc.md +++ b/docs/reference/jobspec-openapi-v0-rfc.md @@ -135,9 +135,9 @@ JobSpec v0 maps directly onto the doctrine: | --- | --- | | II. Track Everything in Git | Durable, inspectable ledger events | | III. One Agent, One Job | Stable ids, bounded payloads, and status | -| V. Validate Externally | Independent status and event inspection | -| VI. Lock Progress Forward | Accepted work survives restart and lost acks | -| VIII. Compound Knowledge | Projections feed future sessions and tools | +| VII. Validate Externally | Independent status and event inspection | +| VIII. Lock Progress Forward | Accepted work survives restart and lost acks | +| X. Compound Knowledge | Projections feed future sessions and tools | | XI. Supervise Hierarchically | Workers claim jobs through leases | The schema is not the product by itself. The conformance program is the diff --git a/docs/tutorials/validate-before-you-ship.md b/docs/tutorials/validate-before-you-ship.md index 22d3fb6..aab65bc 100644 --- a/docs/tutorials/validate-before-you-ship.md +++ b/docs/tutorials/validate-before-you-ship.md @@ -110,9 +110,9 @@ Grow --- -## Step 1: Research Before You Build (Factor IV) +## Step 1: Research Before You Build (Factor V) -**Purpose:** Understand the codebase before making changes. This is Factor IV: Research Before You Build. +**Purpose:** Understand the codebase before making changes. This is Factor V: Research Before You Build. @@ -153,7 +153,7 @@ Research bundle saved: .agents/ao/bundles/research-auth-001.md
-**Factor IV in action:** +**Factor V in action:** Understanding before acting prevents working on the wrong thing entirely. @@ -170,7 +170,7 @@ Understanding before acting prevents working on the wrong thing entirely. --- -## Step 2: Plan and Simulate Failures (Factor IV continued) +## Step 2: Plan and Simulate Failures (Factor V continued) **Purpose:** Simulate failures before you build. Find the problems before they find you. The /pre-mortem skill extends research into risk anticipation. @@ -247,14 +247,14 @@ Pre-mortem: "What COULD go wrong?" 1. **Run BEFORE implementation** - Research first, then anticipate failures 2. **Take the risks seriously** - If it identifies a HIGH risk, address it in your plan -3. **Create checkpoints** - These become your validation gates for Factor V +3. **Create checkpoints** - These become your validation gates for Factor VII 4. **Review with the team** - Pre-mortems surface assumptions worth discussing --- -## Step 3: Validate Externally Before Every Commit (Factor V) +## Step 3: Validate Externally Before Every Commit (Factor VII) -**Purpose:** Validate that your implementation does what you intended using external checks, not just the agent's self-assessment. This is Factor V: Validate Externally. +**Purpose:** Validate that your implementation does what you intended using external checks, not just the agent's self-assessment. This is Factor VII: Validate Externally.
@@ -350,9 +350,9 @@ Safe to commit. --- -## Step 4: Lock Progress Forward (Factor VI) +## Step 4: Lock Progress Forward (Factor VIII) -**Purpose:** Once validation passes, commit immediately. Each validated commit becomes a safe checkpoint you can return to. This is Factor VI: Lock Progress Forward. +**Purpose:** Once validation passes, commit immediately. Each validated commit becomes a safe checkpoint you can return to. This is Factor VIII: Lock Progress Forward. Every passing /vibe check should result in an immediate commit. Small, validated commits create a trail of known-good states. If something breaks later, you know exactly where to roll back to. @@ -365,9 +365,9 @@ Every passing /vibe check should result in an immediate commit. Small, validated --- -## Step 5: Extract Learnings (Factor VII) +## Step 5: Extract Learnings (Factor IX) -**Purpose:** Close the loop. Extract what worked so future sessions benefit. This is Factor VII: Extract Learnings. +**Purpose:** Close the loop. Extract what worked so future sessions benefit. This is Factor IX: Extract Learnings.
@@ -447,9 +447,9 @@ Without /retro, every session starts from zero. With /retro, knowledge compounds --- -## Step 6: Compound Knowledge (Factor VIII) +## Step 6: Compound Knowledge (Factor X) -**Purpose:** Make every session build on the last. This is Factor VIII: Compound Knowledge (HERO) -- Harvest, Extract, Refine, Operationalize. +**Purpose:** Make every session build on the last. This is Factor X: Compound Knowledge (HERO) -- Harvest, Extract, Refine, Operationalize.
@@ -566,42 +566,42 @@ Pending Candidates (3): - + - + - + - + - + - + @@ -627,7 +627,7 @@ Simple changes with no risks are fast to implement with confidence. ### "How do I know /vibe is working?" -Check your metrics over time (Factor IX: Measure What Matters): +Check your metrics over time (Factor XII: Measure Outcomes): - Trust pass rate should increase - Rework ratio should decrease - Fewer "oops" commits fixing previous commits @@ -651,8 +651,8 @@ The flywheel benefit is compound. Session 1 feels the same. Session 50 feels lik ### Before You Build ``` -/research # Understand first (Factor IV) -/pre-mortem # Simulate failures (Factor IV) +/research # Understand first (Factor V) +/pre-mortem # Simulate failures (Factor V) ``` @@ -660,10 +660,10 @@ The flywheel benefit is compound. Session 1 feels the same. Session 50 feels lik ### Before You Commit ``` -/vibe # Validate externally (Factor V) +/vibe # Validate externally (Factor VII) # Fix any warnings /vibe # Confirm fix -git commit # Lock progress (Factor VI) +git commit # Lock progress (Factor VIII) ``` @@ -673,8 +673,8 @@ git commit # Lock progress (Factor VI) ### After You Ship ``` -/retro # Extract learnings (Factor VII) -/flywheel promote # Compound knowledge (Factor VIII) +/retro # Extract learnings (Factor IX) +/flywheel promote # Compound knowledge (Factor X) ``` @@ -684,7 +684,7 @@ git commit # Lock progress (Factor VI) ``` /flywheel status # Check health /flywheel pool list # Clear backlog -# Review metrics # Factor IX +# Review metrics # Factor XII ``` diff --git a/docs/tutorials/workflow-guide.md b/docs/tutorials/workflow-guide.md index 841e9da..c0bcf6e 100644 --- a/docs/tutorials/workflow-guide.md +++ b/docs/tutorials/workflow-guide.md @@ -134,7 +134,7 @@ Lock Progress
-## The 5 Metrics (Factor IX: Measure What Matters) +## The 5 Metrics (Factor XII: Measure Outcomes)
@@ -259,7 +259,7 @@ Ready to proceed.
@@ -324,7 +324,7 @@ Research bundle (5:1 compression ratio). Future sessions load this instead of re @@ -384,7 +384,7 @@ Plan with 6 validated steps. Every step has a concrete way to verify it worked - @@ -434,8 +434,8 @@ Context: 52% (above 40% threshold) **What's happening:** - Execute each step -- Validate externally after each (Factor V) -- Commit immediately on pass (Factor VI) +- Validate externally after each (Factor VII) +- Commit immediately on pass (Factor VIII) - Human approval at checkpoints - Monitor context usage @@ -455,7 +455,7 @@ Time to save and resume fresh. Quality degrades as context fills. @@ -583,7 +583,7 @@ Yesterday's 52% becomes today's 8%. All context preserved in bundles. This is Fa @@ -619,7 +619,7 @@ AUTH-001 Complete! - Finish remaining step - Validate behavior externally -- Commit with context (Factor VI) +- Commit with context (Factor VIII)
@@ -637,7 +637,7 @@ Full context preserved across the boundary. @@ -681,7 +681,7 @@ Run /retro? > Yes
-**Factor IX:** Concrete metrics prove the discipline works. Not vibes about vibes -- numbers. +**Factor XII:** Concrete metrics prove the discipline works. Not vibes about vibes -- numbers. @@ -693,7 +693,7 @@ Run /retro? > Yes @@ -763,7 +763,7 @@ Patterns extracted here become shortcuts later. The vibe-check proves the discip @@ -890,16 +890,16 @@ Next auth feature loads these patterns automatically. The HERO cycle (Harvest, E - - - - - - + + + + + + - - - + + +
1 /researchIV. Research Before You BuildV. Research Before You Build Working on the wrong thing Before planning
2 /pre-mortemIV. Research Before You BuildV. Research Before You Build Problems before they exist Before implementing
3 /vibeV. Validate ExternallyVII. Validate Externally Implementation doesn't match intent Before every commit
4 /retroVII. Extract LearningsIX. Extract Learnings Lost learnings After significant work
5 /post-mortemXII. Harvest Failures as WisdomX. Compound Knowledge Repeated mistakes After failures
6 /flywheelVIII. Compound KnowledgeX. Compound Knowledge Knowledge decay Weekly check-in
-### Research (Factor IV: Research Before You Build) +### Research (Factor V: Research Before You Build)
-### Plan (Factor IV continued + Factor V: Validate Externally) +### Plan (Factor V continued + Factor VII: Validate Externally)
-### Implement (Factor V: Validate Externally + Factor VI: Lock Progress Forward) +### Implement (Factor VII: Validate Externally + Factor VIII: Lock Progress Forward)
-### Session End -- Mid-Feature (Factor II: Track Everything in Git) +### Session End -- Mid-Feature (Factor VIII: Lock Progress Forward)
-### Implement Continued (Factor V + Factor VI) +### Implement Continued (Factor VII + Factor VIII)
-### Session End -- Feature Complete (Factor IX: Measure What Matters) +### Session End -- Feature Complete (Factor XII: Measure Outcomes)
-### Retro (Factor VII: Extract Learnings) +### Retro (Factor IX: Extract Learnings)
-### Compound Knowledge (Factor VIII: Compound Knowledge -- HERO) +### Compound Knowledge (Factor X: Compound Knowledge -- HERO)
What
/session-startBeginningI. Context Is EverythingLoad state, capture baseline
/researchNew problemIV. Research Before You BuildExplore, compress to bundle
/planAfter researchIV. Research Before You BuildDesign steps with validation gates
/pre-mortemBefore implementingIV. Research Before You BuildSimulate failures, define checkpoints
/implementAfter approvalV. Validate ExternallyExecute + validate each step
/vibeBefore every commitV. Validate ExternallySemantic validation of changes
/session-endContext high or doneVI. Lock Progress ForwardSave state, capture delta
/researchNew problemV. Research Before You BuildExplore, compress to bundle
/planAfter researchV. Research Before You BuildDesign steps with validation gates
/pre-mortemBefore implementingV. Research Before You BuildSimulate failures, define checkpoints
/implementAfter approvalVII. Validate ExternallyExecute + validate each step
/vibeBefore every commitVII. Validate ExternallySemantic validation of changes
/session-endContext high or doneVIII. Lock Progress ForwardSave state, capture delta
/session-resumeContinuing workI. Context Is EverythingLoad bundles, resume fresh
/retroFeature doneVII. Extract LearningsReview + extract patterns
/post-mortemAfter failuresXII. Harvest Failures as WisdomTurn failures into prevention
/flywheelAfter retroVIII. Compound KnowledgeHERO cycle: promote learnings
/retroFeature doneIX. Extract LearningsReview + extract patterns
/post-mortemAfter failuresX. Compound KnowledgeTurn failures into prevention
/flywheelAfter retroX. Compound KnowledgeHERO cycle: promote learnings
---