feat: add opt-in evidence-driven self-evolution loop to Phase 7 by epoko77-ai · Pull Request #45 · revfactory/harness

epoko77-ai · 2026-06-18T12:19:10Z

Summary

Phase 7 (하네스 진화) is currently feedback-driven only — a human must start it, the basis is a subjective sentence, and there is no regression gate on edits. This PR adds an opt-in autonomous mode (7-6) for harnesses that already have a deterministic verifier and a splittable, repeatable task set: it mines failures from execution traces, proposes bounded surface-tied edits, and promotes them only through a conservative non-regression gate. It is fully backward-compatible — harnesses without a verifier keep using 7-1~7-4 unchanged.

Motivation

Related to: adapting the Self-Harness paradigm (arXiv:2606.09498 — "harnesses that improve themselves") to the factory's evolution phase.
The paper's full autonomous loop (deterministic verifier → held-in/held-out → non-regression gate) only transfers cleanly to verifier-equipped harnesses, so the addition is gated behind an explicit eligibility check rather than applied blanket. Subjective harnesses get a documented golden-sample fallback that keeps the discipline (evidence-grounded, minimal, surface-tied, reversible, logged) without faking a numeric gate.

Scope of change

Skill / meta-skill logic (skills/harness/SKILL.md — Phase 7 split into two modes + new 7-6)
Documentation (skills/harness/references/self-evolution-loop.md — new)
CHANGELOG.md
Agent template(s)
Plugin manifest
CI / GitHub Actions
Tests

Tests

Manual repro: instantiated the loop on a real verifier-equipped harness (deterministic prompt-quality KPI + fixed simulation cases as held-in + fresh-sample red-team as held-out). A dry-run confirmed the non-regression gate rejects a candidate that improves held-out but regresses held-in — a regression the held-in-only process would have promoted. No new infrastructure was required to wire the loop onto existing QA components.
No leakage: the generic skill/reference files contain no project- or portfolio-specific content (kept fully generic).
N/A for automated tests — this is meta-skill prose; repo has no markdownlint config / CI workflow checked in.

CHANGELOG update

Yes — added to CHANGELOG.md under Unreleased

SemVer impact

Minor — additive, backward-compatible (feat:)

Additional notes

Diff is ~182 lines (under the 400-line "discuss first" threshold in CONTRIBUTING.md). Happy to convert this into a Discussion/RFC first if you'd prefer to align on the 7-6 design before reviewing the prose. The bulk of detail lives in the conditionally-loaded reference file to respect the <500-line SKILL.md / Progressive Disclosure guidance.

Phase 7 evolution was feedback-driven only: a human had to start it, the basis was a subjective sentence, and there was no regression gate on edits. This adds an opt-in autonomous mode for harnesses that have a deterministic verifier and a splittable repeatable task set. - SKILL.md: split Phase 7 into two modes (feedback-driven default / evidence-driven autonomous opt-in); add 7-6 with an eligibility gate, the 3-stage loop (weakness mining -> bounded proposal -> validation), the conservative non-regression acceptance rule (Δ_in >= 0 AND Δ_ho >= 0 AND max > 0), and snapshot/rollback logging. - references/self-evolution-loop.md: eligibility classes (A/B/C), failure signature schema (c, q, m), full procedure, and a golden-sample fallback for subjective harnesses. - CHANGELOG.md: Unreleased entry. Adapts the Self-Harness paradigm (arXiv:2606.09498) to the factory's evolution phase. Backward-compatible: harnesses without a deterministic verifier keep using 7-1..7-4 unchanged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add opt-in evidence-driven self-evolution loop to Phase 7#45

feat: add opt-in evidence-driven self-evolution loop to Phase 7#45
epoko77-ai wants to merge 1 commit into
revfactory:mainfrom
epoko77-ai:feat/self-evolution-loop

epoko77-ai commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

epoko77-ai commented Jun 18, 2026

Summary

Motivation

Scope of change

Tests

CHANGELOG update

SemVer impact

Additional notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant