18 Apr 15:49

21d63c4

v12.1.0 — Codex CLI runtime contract + factual corrections Latest

Latest

Headline

Option D (Codex CLI runtime) is now a first-class Phase 3 path with a full runtime contract — not the five-bullet prose stub it had been. Native subagents via the [agents] config block (D1) and shell-level parallelism via codex exec backgrounding (D2) are both documented with runnable commands, verified against Codex 0.121.0 source.

A SKILL.md frontmatter description that silently exceeded Codex's 1024-character load limit (skipping the entire skill in Codex CLI) was trimmed to 960 characters without losing any skill-selection trigger.

Docs-only release. No runtime code changes.

Codex CLI as a documented runtime (PR #43)

New file: references/runtime/codex-runtime.md — runtime contract paralleling the implicit ones used by Options A/B. Covers D1 vs D2 picker logic, prerequisites, invocation, artifact contract, concurrency caps, status-line patterns, retries, and known gotchas (#11435, #14866, #15177 — all cross-linked).

SKILL.md Option D rewritten:

D1 Native subagents — [features] multi_agent = true + [agents] max_threads = 6 + ~/.codex/agents/<name>.toml role definitions + prompt-driven spawn. Includes a working orchestrator example that fans out one rpt_audit_explorer per audit dimension and synthesizes the result.
D2 Shell-level codex exec — runnable bash block with --ephemeral, --sandbox workspace-write, --output-last-message, artifact verification, FS-polling status line, hang recovery, and a FIFO semaphore with failure propagation.
Picker table (D1 vs D2 vs "neither, use Option B for cross-agent messaging").

SKILL.md Settings now has a Codex CLI subsection documenting ~/.codex/config.toml with [features], [agents], and skill-defined [reprompter] keys. Claude Code is clarified as optional when Codex is the target runtime.

Compatibility claim rewritten from hedged "parallel sessions if available" to naming the actual mechanism. README compatibility table aligned — removed the asterisk on Codex parallel and added a clarifier pointing to Option D.

Factual corrections (PR #44, 8 commits over 7 bot-review rounds)

Every correction verified against openai/codex rust-v0.121.0 source, codex exec --help, and the current status of each cited GitHub issue as of 2026-04-18.

--full-auto semantics in codex exec. Source: codex-rs/exec/src/cli.rs:50–52 defines it as "Convenience alias for low-friction sandboxed automatic execution (--sandbox workspace-write)". lib.rs:263 only selects the sandbox when full_auto is true; lib.rs:374–376 sets approval policy unconditionally to AskForApproval::Never for headless mode. Docs now recommend --sandbox workspace-write for readability and explain that both options work.
--sandbox read-only artifact-write bug. D2 workers write their own /tmp/rpt-*.md artifacts; read-only breaks this contract. Reverted; read-only is now documented only for pure-analysis workers using --output-last-message as the artifact path.
report_agent_job_result scope. Registered only for spawn_agents_on_csv batch workers, not ordinary prompt-spawned subagents. Removed from the D1 custom-agent template.
[agents] max_threads semantics. reserve_spawn_slot returns AgentLimitReached when the cap is reached — normal spawn_agent calls past the cap fail, they do not queue. Replaced the "queues 2 and runs 6" line with the correct failure mode and pointed readers at spawn_agents_on_csv for true fan-out.
Issue #11435 framing. Closed as not-reproducible after exec was reimplemented on the app server. --ephemeral reframed from "required to avoid corruption" to a historical motivation for the flag.
Issue #15177 fix claim. Still open with no linked fix. Removed the "Fixed in 0.122.0-alpha" claim; documented the actual current-state workaround.
codex exec approval default. Hardcoded Never in headless mode. The approval_policy key in config.toml applies to the interactive TUI only.
features.multi_agent default. Default-enabled in 0.121.0+. Docs no longer imply users must set this explicitly.
Native subagents ship date. "Shipped 2026-03-16" → "multi_agent feature flag stabilized in 0.115.0 on 2026-03-16" (matches rust-v0.115.0 release: #14622 Stabilize multi-agent feature flag).
Bash portability. POSIX-compatible artifact-count loop using [ -e "$f" ] and case (not Bash-only [[ ]]), zero-match safe under set -euo pipefail. Runs under dash too.
FIFO semaphore failure propagation. Explicit PID collection, per-PID wait, status aggregation, fd close after the loop, and exit "$status" so downstream synthesis does not run on missing artifacts. trap 'echo >&9' EXIT guarantees the semaphore token is returned even on non-zero worker exit.
Picker-table drift. Added the missing Cross-agent messaging required mid-run → use Option B row to the lower SKILL.md picker table.
macOS CPU-count. Added sysctl -n hw.ncpu alongside nproc.

SKILL.md description under Codex load limit (PR #45)

Codex 0.121.0 enforces a 1024-character limit on the SKILL.md description field via validate_len(&description, MAX_DESCRIPTION_LEN, "description"). The description was 1217 characters, so Codex silently skipped the skill with:

Skipped loading 1 skill(s) due to invalid SKILL.md files.
~/.codex/skills/reprompter/SKILL.md: invalid description: exceeds maximum length of 1024 characters

Claude Code did not enforce the limit, so the bug was Codex-only and easy to miss.

Trimmed the description to 960 characters (64-character safety margin). Every Single / Repromptverse / Reverse-mode trigger keyword preserved; only verbose phrasing and redundant aliases removed.

Review notes

PR #44 went through 7 rounds of automated Codex bot review plus a source-level cross-check at the rust-v0.121.0 tag. Each round traded a narrower, more accurate claim for a broader, sloppier one — the final wording is grounded in cited source lines rather than memory-from-spec.

Lesson captured in the commit messages: source-verify contested claims before prose lands in a docs-only PR.

What's next (deliberately out of scope)

TESTING.md scenarios for D1 (native subagent fan-out) and D2 (shell-level codex exec fan-out).
Codex-specific install one-liner in README (alongside the existing Claude Code curl | tar recipe).

Both fit better as small follow-up PRs so the review surface stays focused.

Contributors

Thanks to @dorukardahan for the full Codex CLI runtime write-up, factual corrections, and release prep across PRs #43, #44, #45, #46.

Full diff: v12.0.0...v12.1.0

Contributors

dorukardahan

Assets 2

17 Apr 09:15

AytuncYildizli

v12.0.0

3053354

v12.0.0 — Closed-loop Flywheel

Headline

Reprompter is no longer an open-loop prompt rewriter. Every generated prompt emits testable success criteria, every run can be recorded and scored, every outcome feeds a local flywheel that biases future generations toward historical winners, and npm run flywheel:ab proves whether the bias is actually helping. All data local. No telemetry.

This release also recovers Repromptverse under opus 4.7 (which enforces tool schemas strictly where 4.6 was lenient), ships a tool-drift linter as long-term regression insurance, and hardens the Repromptverse runtime selection path.

The closed loop, end-to-end

User rough prompt
  → Mode 1 / Mode 2 / Mode 3 interview
  → [opt-in] flywheel:query consults past outcomes
  → [if confidence ≥ medium] template + patterns biased
  → generated prompt with <success_criteria schema_version="1"> block
  → user runs it downstream
  → outcome-record.js stamps the run (with applied_recommendation if biased)
  → evaluate-outcome.js scores against criteria
  → flywheel:ingest bridges into NDJSON store
  → strategy-learner aggregates by recipe
  → flywheel:ab compares bias-on vs bias-off effectiveness

Major additions

Closed-loop flywheel (v2 + v3 rollout)

v1 outcome-record schema at .reprompter/outcomes/*.json (structured success_criteria, verification_results, score, optional role + applied_recommendation)
scripts/outcome-record.js — write records (with collision-safe filenames, role attribution, optional applied_recommendation stamping)
scripts/evaluate-outcome.js — score records against criteria (rule/regex, rule/predicate, llm_judge via user --judge-cmd, manual)
scripts/outcome-collector.js ingest bridge — idempotent, deterministic sort, role-domain routing, applied_recommendation preservation
scripts/strategy-learner.js::getRecommendation — read-only query API
scripts/strategy-learner.js::buildAbReport — bias-on vs bias-off effectiveness delta with low-sample warnings
Bias injection — REPROMPTER_FLYWHEEL_BIAS=0|1 env flag (default off). Mode 1 step 5 + Mode 2 Phase 2 consult the flywheel when set.
<success_criteria> emission across all three modes

Infrastructure + opus-4.7 compatibility

Tool-drift linter (scripts/validate-tool-refs.js) — catches every obsolete tool shape we've shipped a fix for. Multi-line regex support.
Auto-pick runtime (Repromptverse Phase 3) — detects capability and picks Options A–E automatically
Tool-schema guard — canonical signatures + pitfall list captured from the 4.6→4.7 drift

Opus 4.7 recovery

Task(subagent_type=...) → Agent(...)
SendMessage(type=, recipient=) → SendMessage(to=, message=)
Broadcast shutdown → per-agent
Plus a codex review round addressing filename collision, shell quoting, regex validation, idempotent ingest, deterministic sort, agent-identity routing, partial-promptShape wildcards, filter-before-limit, Mode-3 checklist

New npm scripts

npm run validate:tool-refs
npm run flywheel:query
npm run flywheel:ingest
npm run flywheel:ab

New env flag

REPROMPTER_FLYWHEEL_BIAS=0|1 (default off — read-path consultation; complements existing REPROMPTER_FLYWHEEL=0|1 which controls outcome writing)

Tests

205 tests pass (was 169). outcome-collector: 30 → 43. strategy-learner: 24 → 36. Plus new --self-test modes on outcome-record.js and evaluate-outcome.js.

What's deliberately still ahead

Default-on flip of REPROMPTER_FLYWHEEL_BIAS. Waiting for flywheel:ab to show a consistent positive delta_mean_effectiveness across multiple task types with ≥5 samples per group.
Per-role bias queries for Repromptverse teams once role-stamped records accumulate.
Visualizations / dashboards on top of flywheel:ab output.
Community / telemetry pooling — the loop stays local-first.

Full detail

See CHANGELOG.md for the complete v12.0.0 entry, including the PR-by-PR breakdown of the 20 PRs (#23 through #42) that shipped this release.

Credit

Thanks to codex for two rounds of review that caught P1/P2 issues before they shipped.

Assets 2

19 Mar 18:07

AytuncYildizli

v10.0.0

8d2070a

v10.0.0 — Repromptmania

Repromptmania

Agents now ask before they act, and show what they found.

Dimension Interview

Repromptverse Phase 1 scores your raw prompt on 4 dimensions (Clarity, Specificity, Constraints, Decomposition). Low-scoring dimensions become targeted questions (0-4 max). No more vague prompts spawning expensive agents.

Agent Cards

Plan Cards (Phase 1): see every agent's role, scope, exclusions, and output path before execution
Status Line (Phase 3): compact emoji-based polling during execution
Result Cards (Phase 4): per-agent score, finding count, and key insight before synthesis

User Confirmation Gate

Team plan shown before execution. You approve, adjust, or cancel before any agent runs.

Details

42 test scenarios, 9 anti-patterns
All 141 unit tests + 4 benchmarks pass
No runtime code changes — behavioral spec only
Full changelog: CHANGELOG.md

Assets 2

15 Mar 15:04

github-actions

v9.2.2

9eeab19

v9.2.2 — Production Polish

Final polish pass for production readiness.

Version aligned to match across all files
CHANGELOG cleaned (no more semantic-release duplicates)
Template selection bias: flywheel now recommends historically best template, the most impactful decision
REPROMPTER_FLYWHEEL_MAX_OUTCOMES env var for configurable ledger size (default 500)
125 tests + 188 benchmarks, 0 failures

Full changelog: v9.2.1...v9.2.2

Assets 2

15 Mar 14:44

github-actions

v9.2.1

fe147a8

v9.2.1 — 7 Flywheel Gaps Fixed

All 7 critical flywheel gaps resolved

Fixed by a 3-agent parallel team (RuntimeEngineer, OutcomeEngineer, DocsEngineer) using reprompter's own Repromptverse mode.

Fixes

#	Gap	Resolution
1	flywheelPreferredTier dead code	capability-policy.js now reads tier, applies +2 score boost to matching models
2	postCorrectionEdits phantom	collectGitSignals() counts recent file edits via git log
3	.reprompter/ not in gitignore	Added to .gitignore
4	Pattern merge incomplete	getPatternById() helper + full pattern object sync after bias
5	Ledger unbounded	trimOutcomes(500) with atomic write, auto-trim on every write
6	No E2E integration test	flywheel-e2e.test.js — 5 tests covering full cycle
7	SKILL.md no user guidance	Flywheel user guidance subsection (when/how to surface)

Test results

124 unit tests + 188 benchmark fixtures — 100% pass
5 new E2E tests for full flywheel cycle
Zero regressions

Full changelog: v9.2.0...v9.2.1

Assets 2

14 Mar 23:48

github-actions

v9.2.0

1274ca6

v9.2.0 — Version Alignment

Version alignment release. Cleaned up semantic-release auto-generated changelog duplicates and aligned version strings across all files (package.json, SKILL.md, README.md).

No functional changes from v9.1.0.

Full changelog: v9.1.0...v9.2.0

Assets 2

14 Mar 23:42

github-actions

v9.1.0

2f7c824

v9.1.0 — Closed-Loop Flywheel

The loop is closed.

v9.0 introduced the Prompt Flywheel. v9.1 closes the loop: historical outcomes now automatically change future execution behavior.

bestRecipeForDomain() — domain-only lookup before decisions
applyFlywheelBias() — confidence-gated pattern merge + tier override
8 new unit tests (118 total)

Full changelog: v9.0.0...v9.1.0

Assets 2

14 Mar 23:19

AytuncYildizli

v9.0.0

93c959a

v9.0.0 — Prompt Flywheel

Prompt Flywheel — closed-loop outcome learning

The prompt engineer that gets smarter every time you use it.

New

Recipe fingerprinting — deterministic SHA-256 hash of prompt strategy vectors (template + patterns + tier + domain + layers + quality bucket)
Passive outcome collection — captures artifact scores, retry counts, execution time at finalize_run. All data stored locally in .reprompter/flywheel/outcomes.ndjson
Adaptive strategy learning — queries outcome ledger for similar past tasks, scores recipe groups with time-decay weighting (7-day half-life), recommends best-performing strategy with confidence levels
Runtime integration — flywheel hooks at plan_ready and finalize_run in repromptverse-runtime.js
Feature flag — REPROMPTER_FLYWHEEL=0|1 (enabled by default)
3 new telemetry stages — fingerprint_recipe, collect_outcome, learn_strategy
Flywheel benchmark harness — 13 fixtures (fingerprint 4, effectiveness 6, strategy 3)
48 new unit tests — recipe-fingerprint (14), outcome-collector (19), strategy-learner (15)

Privacy

All flywheel data is stored locally. Nothing is transmitted anywhere.

Test results

110 unit tests — 100% pass
188 benchmark fixtures — 100% pass
Zero regressions on v8.3 tests and benchmarks

Full changelog: v8.3.0...v9.0.0

Assets 2

14 Mar 23:12

github-actions

v8.3.0

93c959a

v8.3.0

8.3.0 (2026-03-14)

Features

milestone 1 telemetry and observability pipeline (45d05cb)
milestone 2 real-world benchmarks and routing calibration (540fd9a)
release v8.3.0 runtime optimization stack (d9420fb)
release v9.0.0 prompt flywheel engine (a87c9f1)

Assets 2

24 Feb 13:48

AytuncYildizli

v8.2.0

4fae923

v8.2.0

Added

Deterministic intent router — scripts/intent-router.js with explicit profile triggers + weighted keyword routing
Router unit tests — scripts/intent-router.test.js (8 passing tests)
Benchmark harness — scripts/run-swarm-benchmark.js + fixture set under benchmarks/fixtures/
Benchmark reports — generated markdown/json artifacts for pre-release checks

Changed

Codex/Claude operational parity hardened with runnable npm run check pipeline (templates + router tests + benchmark)
Packaging scope tightened — benchmark artifacts and router test file excluded from skill zip
Version alignment across docs and skill metadata to v8.2.0

Assets 2

Releases: AytuncYildizli/reprompter

v12.1.0 — Codex CLI runtime contract + factual corrections

Headline

Codex CLI as a documented runtime (PR #43)

Factual corrections (PR #44, 8 commits over 7 bot-review rounds)

SKILL.md description under Codex load limit (PR #45)

Review notes

What's next (deliberately out of scope)

Contributors

Contributors

Uh oh!

v12.0.0 — Closed-loop Flywheel

Headline

The closed loop, end-to-end

Major additions

Closed-loop flywheel (v2 + v3 rollout)

Infrastructure + opus-4.7 compatibility

Opus 4.7 recovery

New npm scripts

New env flag

Tests

What's deliberately still ahead

Full detail

Credit

Uh oh!

v10.0.0 — Repromptmania

Repromptmania

Dimension Interview

Agent Cards

User Confirmation Gate

Details

Uh oh!

v9.2.2 — Production Polish

Uh oh!

v9.2.1 — 7 Flywheel Gaps Fixed

All 7 critical flywheel gaps resolved

Fixes

Test results

Uh oh!

v9.2.0 — Version Alignment

Uh oh!

v9.1.0 — Closed-Loop Flywheel

The loop is closed.

Uh oh!

v9.0.0 — Prompt Flywheel

Prompt Flywheel — closed-loop outcome learning

New

Privacy

Test results

Uh oh!

v8.3.0

8.3.0 (2026-03-14)

Features

Uh oh!

v8.2.0

Added

Changed

Uh oh!