Skip to content

feat: add opt-in evidence-driven self-evolution loop to Phase 7#45

Open
epoko77-ai wants to merge 1 commit into
revfactory:mainfrom
epoko77-ai:feat/self-evolution-loop
Open

feat: add opt-in evidence-driven self-evolution loop to Phase 7#45
epoko77-ai wants to merge 1 commit into
revfactory:mainfrom
epoko77-ai:feat/self-evolution-loop

Conversation

@epoko77-ai

Copy link
Copy Markdown

Summary

Phase 7 (하네스 진화) is currently feedback-driven only — a human must start it, the basis is a subjective sentence, and there is no regression gate on edits. This PR adds an opt-in autonomous mode (7-6) for harnesses that already have a deterministic verifier and a splittable, repeatable task set: it mines failures from execution traces, proposes bounded surface-tied edits, and promotes them only through a conservative non-regression gate. It is fully backward-compatible — harnesses without a verifier keep using 7-1~7-4 unchanged.

Motivation

  • Related to: adapting the Self-Harness paradigm (arXiv:2606.09498 — "harnesses that improve themselves") to the factory's evolution phase.
  • The paper's full autonomous loop (deterministic verifier → held-in/held-out → non-regression gate) only transfers cleanly to verifier-equipped harnesses, so the addition is gated behind an explicit eligibility check rather than applied blanket. Subjective harnesses get a documented golden-sample fallback that keeps the discipline (evidence-grounded, minimal, surface-tied, reversible, logged) without faking a numeric gate.

Scope of change

  • Skill / meta-skill logic (skills/harness/SKILL.md — Phase 7 split into two modes + new 7-6)
  • Documentation (skills/harness/references/self-evolution-loop.md — new)
  • CHANGELOG.md
  • Agent template(s)
  • Plugin manifest
  • CI / GitHub Actions
  • Tests

Tests

  • Manual repro: instantiated the loop on a real verifier-equipped harness (deterministic prompt-quality KPI + fixed simulation cases as held-in + fresh-sample red-team as held-out). A dry-run confirmed the non-regression gate rejects a candidate that improves held-out but regresses held-in — a regression the held-in-only process would have promoted. No new infrastructure was required to wire the loop onto existing QA components.
  • No leakage: the generic skill/reference files contain no project- or portfolio-specific content (kept fully generic).
  • N/A for automated tests — this is meta-skill prose; repo has no markdownlint config / CI workflow checked in.

CHANGELOG update

  • Yes — added to CHANGELOG.md under Unreleased

SemVer impact

  • Minor — additive, backward-compatible (feat:)

Additional notes

Diff is ~182 lines (under the 400-line "discuss first" threshold in CONTRIBUTING.md). Happy to convert this into a Discussion/RFC first if you'd prefer to align on the 7-6 design before reviewing the prose. The bulk of detail lives in the conditionally-loaded reference file to respect the <500-line SKILL.md / Progressive Disclosure guidance.

Phase 7 evolution was feedback-driven only: a human had to start it, the
basis was a subjective sentence, and there was no regression gate on edits.
This adds an opt-in autonomous mode for harnesses that have a deterministic
verifier and a splittable repeatable task set.

- SKILL.md: split Phase 7 into two modes (feedback-driven default /
  evidence-driven autonomous opt-in); add 7-6 with an eligibility gate,
  the 3-stage loop (weakness mining -> bounded proposal -> validation),
  the conservative non-regression acceptance rule
  (Δ_in >= 0 AND Δ_ho >= 0 AND max > 0), and snapshot/rollback logging.
- references/self-evolution-loop.md: eligibility classes (A/B/C), failure
  signature schema (c, q, m), full procedure, and a golden-sample fallback
  for subjective harnesses.
- CHANGELOG.md: Unreleased entry.

Adapts the Self-Harness paradigm (arXiv:2606.09498) to the factory's
evolution phase. Backward-compatible: harnesses without a deterministic
verifier keep using 7-1..7-4 unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant