feat(engine): add verification, cost tracking, and smarter completion#331
feat(engine): add verification, cost tracking, and smarter completion#331highimpact-dev wants to merge 7 commits intosubsy:mainfrom
Conversation
- Verification gates between iterations - Auto-commit defaults (enable by default) - Model escalation strategy (sonnet → opus on failure) - Cross-iteration context (structured diff summaries) - Completion detection hardening (strategy pattern) - Acceptance criteria validation (auto-extract executable checks) - Cost tracking (model-aware pricing, TUI display, threshold alerts) - First-class parallel execution (auto-detect independent tasks) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add configurable verification commands that run after agent signals completion but before task is marked done. Includes retry with error injection, output truncation, and proper cleanup on skip/abort. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation Step 2: autoCommit defaults to true, improved commit messages with iteration number, --no-auto-commit CLI flag for opt-out. Step 3: Model escalation starts with cheaper model, escalates on verification failure. Config properly flows through buildConfig pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on detection Step 4: Generate structured diff summaries after each iteration, feed as rolling context to subsequent prompts instead of raw output history. Step 5: Pluggable completion detection strategies - promise-tag (default), relaxed-tag, and heuristic. Configurable per project. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…llel first-class Step 6: Parse acceptance criteria for executable assertions (shell commands, file existence checks). Run as verification commands with proper shell escaping. Step 7: Track cumulative cost per session with model-aware pricing. Display in TUI dashboard. Configurable alert threshold pauses engine. Step 8: Default parallel mode to 'auto' (detect independent tasks). Add --conflict-timeout CLI flag. Improved help text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@highimpact-dev is attempting to deploy a commit to the plgeek Team on Vercel. A member of the Team first needs to authorize it. |
WalkthroughAdds an eight-step engine improvement suite: post-completion verification gate, model escalation, cost tracking, pluggable completion detection, acceptance-criteria validation, cross-iteration diff context, auto-commit defaults, and first-class parallel execution; extends config, templates, engine loop, TUI and adds many tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Engine as ExecutionEngine
participant Agent as AgentProcess
participant CompDetect as CompletionDetection
participant Verification as VerificationGate
participant Escalation as ModelEscalation
participant Cost as CostTracker
participant Templates
participant Session
Client->>Engine: runIteration()
Engine->>Agent: executeAgent()
Agent-->>Engine: AgentExecutionResult
Engine->>CompDetect: detectCompletion(result, strategies)
CompDetect-->>Engine: { completed, matchedStrategy }
alt Completed
Engine->>Verification: runVerification(cwd, commands)
Verification-->>Engine: VerificationResult
alt Verification passed
Engine->>Cost: addIteration(tokens, model)
Cost-->>Engine: iterationCost, snapshot
Engine->>Engine: generateDiffSummary()
Engine->>Session: updateSessionIteration(iteration, diffSummary, snapshot.totalCost)
Engine->>Templates: buildPrompt(... diffContext)
Engine-->>Client: IterationResult { diffSummary, costSnapshot }
else Verification failed and retries remain
Engine->>Templates: buildPrompt(verificationErrors)
Engine->>Escalation: recordTaskAttempt(taskId)
Engine-->>Client: IterationResult (retry queued)
else Verification failed, no retries
Engine->>Engine: markTaskDone()
Engine-->>Client: IterationResult (completed with verification failures)
end
else Not completed
Engine->>Escalation: getModelForTask(taskId)
Escalation-->>Engine: selectedModel
Engine->>Engine: emit model:escalated? / set model
Engine-->>Client: IterationResult (continue)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/tui/components/ProgressDashboard.tsx (1)
254-276:⚠️ Potential issue | 🟡 MinorRow 2 can overflow the 45-character fixed-width column when both
aggregateUsageandtotalCostare populated.With a short tracker name the row can reach ~57+ characters:
"Tracker: json · Σ I/O/T: 1.2M/500K/1.7M · Cost: $1.2345". The fixedwidth: 45on the right column will silently clip the cost portion.Consider either widening the column, moving
Cost:to its own row (Row 3 or 4), or only showing cost whenaggregateUsageis absent.♻️ Option — give cost its own row
- {/* Row 2: Tracker + cost */} + {/* Row 2: Tracker + token totals */} <box style={{ flexDirection: 'row' }}> <text fg={colors.fg.secondary}>Tracker: </text> <text fg={colors.accent.tertiary}>{trackerName}</text> {aggregateUsage && ( <> <text fg={colors.fg.muted}> · </text> <text fg={colors.fg.secondary}>Σ I/O/T: </text> <text fg={colors.accent.secondary}>{formatTokenCount(aggregateUsage.inputTokens)}</text> <text fg={colors.fg.muted}>/</text> <text fg={colors.accent.primary}>{formatTokenCount(aggregateUsage.outputTokens)}</text> <text fg={colors.fg.muted}>/</text> <text fg={colors.status.info}>{formatTokenCount(aggregateUsage.totalTokens)}</text> </> )} - {totalCost !== undefined && totalCost > 0 && ( - <> - <text fg={colors.fg.muted}> · </text> - <text fg={colors.fg.secondary}>Cost: </text> - <text fg={colors.accent.primary}>${totalCost.toFixed(4)}</text> - </> - )} </box> + {/* Row 2b: Cost (own row to avoid overflow) */} + {totalCost !== undefined && totalCost > 0 && ( + <box style={{ flexDirection: 'row' }}> + <text fg={colors.fg.secondary}>Cost: </text> + <text fg={colors.accent.primary}>${totalCost.toFixed(4)}</text> + </box> + )}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tui/components/ProgressDashboard.tsx` around lines 254 - 276, The Row 2 layout in ProgressDashboard (the <box> rendering trackerName, aggregateUsage and totalCost) can overflow the fixed-width column; update the component so Cost is rendered on its own row instead of inline: remove the totalCost block from the existing <box> that contains trackerName/aggregateUsage and add a new separate <box> (e.g., "Row 3: Cost") that conditionally renders the Cost label and the ${totalCost.toFixed(4)} when totalCost is defined and > 0; keep existing helpers like formatTokenCount and the same color styles so only the JSX structure changes (move the totalCost conditional rendering out of the tracker/aggregateUsage <box> into its own <box>).src/tui/components/RunApp.tsx (1)
1840-1885:⚠️ Potential issue | 🟡 MinorInitialise
totalCostfrom engine state on mount to avoid displaying $0.00 after session resume.The
useEffectat lines 1840–1885 initialisestaskUsageMap,activeAgent,rateLimitState, anddetectedModelfromengine.getState(), but does not readstate.costSnapshotto initialisetotalCost. After a session resume or component remount mid-run, the cost widget will display $0.00 until the nextcost:updatedevent fires, despite the engine already having the accumulated cost in state.Suggested fix — seed totalCost from state.costSnapshot
+ if (state.costSnapshot?.totalCost) { + setTotalCost(state.costSnapshot.totalCost); + } if (state.currentModel) { setDetectedModel(state.currentModel); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tui/components/RunApp.tsx` around lines 1840 - 1885, The mount effect that reads engine.getState() should also seed the total cost from the engine state: when engine.getState() returns a state with a costSnapshot, call the existing setter (setTotalCost) with the appropriate value (e.g., state.costSnapshot.total or the field used to represent accumulated cost) so the UI shows the correct cost immediately after resume; update the useEffect that references engine and agentName to check for state.costSnapshot and call setTotalCost(state.costSnapshot.<totalField>) alongside the other initializations (e.g., setTaskUsageMap, setActiveAgentState, setRateLimitState, setDetectedModel).src/engine/index.ts (1)
470-475:⚠️ Potential issue | 🟡 Minor
generatePromptPreviewdoes not includeverificationErrorsordiffContextin the extended context.
buildPrompt(Line 155) now acceptsverificationErrorsanddiffContext, and these are wired into the template context. However,generatePromptPreviewbuilds its ownextendedContext(Lines 471–475) without these fields. This means prompt previews will never show verification errors or diff context blocks, which could confuse users trying to debug why an agent received certain context.Consider adding the missing fields (even as empty strings) to keep parity with the real prompt path.
Proposed fix
const extendedContext = { recentProgress, codebasePatterns, prd: prdContext ?? undefined, + verificationErrors: '', + diffContext: formatDiffContext(this.recentDiffSummaries), };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/index.ts` around lines 470 - 475, generatePromptPreview builds an extendedContext object that omits verificationErrors and diffContext, so previews never show those blocks; update the extendedContext inside generatePromptPreview to include verificationErrors and diffContext (e.g., verificationErrors: verificationErrors ?? "" and diffContext: diffContext ?? "") so it matches buildPrompt's template context and defaults to empty strings when absent; ensure you reference the same variable names used elsewhere (verificationErrors, diffContext, prdContext) so the preview and real prompt paths remain consistent.
🟡 Minor comments (26)
spec/20260222-engine-improvements/06-acceptance-criteria-validation/TODO.md-3-3 (1)
3-3:⚠️ Potential issue | 🟡 MinorStatus appears stale.
The AI summary confirms
src/engine/ac-validator.ts, its tests, and engine integration are all implemented. If Step 6 is indeed complete, update the status to✅ Completeand add a correspondingCOMPLETED.md(as done for other steps).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/06-acceptance-criteria-validation/TODO.md` at line 3, Update the TODO step status from "⬜ Not Started" to "✅ Complete" in spec/20260222-engine-improvements/06-acceptance-criteria-validation/TODO.md and create a corresponding COMPLETED.md for this step mirroring the pattern used by other steps; reference the implemented artifacts (src/engine/ac-validator.ts, its tests, and the engine integration) in the COMPLETED.md so reviewers can verify completion.spec/20260222-engine-improvements/02-auto-commit-defaults/TASK.md-20-20 (1)
20-20:⚠️ Potential issue | 🟡 MinorWrong file extension —
run.tsshould berun.tsx.The CLI command file is
src/commands/run.tsx, notrun.ts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/02-auto-commit-defaults/TASK.md` at line 20, The comment references the wrong file extension: update any mention of `src/commands/run.ts` to the correct `src/commands/run.tsx` (e.g., in the TASK.md line "CLI options parsed in `src/commands/run.ts`") so documentation and diffs point to the actual CLI command file `run.tsx`; ensure any other references to `run.ts` (README, comments, or task descriptions) are also corrected to `run.tsx`..claude/spec-loop.local.md-7-10 (1)
7-10:⚠️ Potential issue | 🟡 MinorCircuit breaker state contradicts its own trigger conditions.
no_progress_count: 2is below the 3-iteration open threshold, anderror_count: 0is far below the 5-iteration error threshold, yetcircuit_breaker: openis set. If the spec-loop agent reads this field to decide whether to proceed, the prematureopenstate will stall it unnecessarily. Either the counters or the breaker state need to be reconciled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/spec-loop.local.md around lines 7 - 10, The circuit_breaker state is inconsistent with threshold fields: with no_progress_count = 2 (<3) and error_count = 0 (<5) the breaker should not be "open"; update the spec so the circuit_breaker field matches the counters (e.g., set circuit_breaker: closed) or else increment no_progress_count/error_count to meet the open thresholds; ensure the change is applied to the keys no_progress_count, error_count, and circuit_breaker so the spec-loop agent reads a consistent state.spec/20260222-engine-improvements/08-parallel-first-class/TODO.md-3-3 (1)
3-3:⚠️ Potential issue | 🟡 MinorStale status — step is already complete.
The
COMPLETED.mdfor this same step ships in the same PR and confirms all items are done. Update the status to reflect completion, e.g.✅ Complete.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/08-parallel-first-class/TODO.md` at line 3, Update the stale status line in TODO.md: replace the existing header "## Status: ⬜ Not Started" with a completed state such as "## Status: ✅ Complete" (since the corresponding COMPLETED.md shows the step is done) so the TODO reflects the real status; keep the rest of the file intact.src/engine/cost-tracker.ts-40-54 (1)
40-54:⚠️ Potential issue | 🟡 MinorNo guard against negative token counts.
addIteration(-1000, 0)would silently decrementtotalCost,inputCost, andtotalInputTokens. A caller passing inverted or invalid usage data could corrupt the running session total.🛡️ Proposed fix — validate inputs
addIteration(inputTokens: number, outputTokens: number, model?: string): number { + if (inputTokens < 0 || outputTokens < 0) { + throw new Error(`Token counts must be non-negative (got input=${inputTokens}, output=${outputTokens})`); + } const pricing = this.getPricing(model);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/cost-tracker.ts` around lines 40 - 54, The addIteration method currently allows negative inputTokens/outputTokens which can corrupt snapshot totals; add validation at the start of addIteration (before calling getPricing) to ensure inputTokens and outputTokens are numbers, finite, and >= 0, and throw a descriptive RangeError (or TypeError) on invalid values; keep the rest of the logic (getPricing, computing inputCost/outputCost, updating this.snapshot.totalCost, inputCost, outputCost, totalInputTokens, totalOutputTokens, and iterationCosts) unchanged so only valid non-negative token counts are recorded..claude/spec-loop.local.md-31-34 (1)
31-34:⚠️ Potential issue | 🟡 MinorCorrupted content in document body — machine-state leaked outside YAML front-matter.
Lines 31–34 contain raw state fragments (
last_completed_step: 5,0) that appear to have been accidentally appended outside the front-matter block. This could cause the spec-loop agent to fail when parsing the document.🛠️ Proposed fix — remove orphaned lines
When circuit breaker opens, analyze and fix before continuing. -last_completed_step: 5 -0 -last_completed_step: 5 -0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/spec-loop.local.md around lines 31 - 34, Remove the stray machine-state fragments `last_completed_step: 5` and `0` that were appended outside the YAML front-matter; open the file, locate the orphaned occurrences of the `last_completed_step` and `0` tokens outside the front-matter block and delete them so the document contains only a properly closed YAML front-matter (ensure `---` start/end markers remain intact) and no leaked machine state that could break the spec-loop parser.src/engine/ac-validator.ts-7-11 (1)
7-11:⚠️ Potential issue | 🟡 Minor
file-containsis declared but unreachable, and itsgrepinvocation is brokenThe
ExecutableACunion (line 9) and theacToVerificationCommandsswitch (line 76) both reference'file-contains', and the JSDoc at line 19–21 promises detection of"X contains Y"/"X includes Y"patterns. However,parseExecutableCriteriahas no code branch that produces a'file-contains'entry — so the case is permanently unreachable.Additionally, the
grepcommand at line 77 has no file argument;grep -q 'pattern'with no file reads from stdin and would block until the verification timeout fires (even though|| trueprevents a non-zero exit afterwards).Either implement the detection logic and fix the command to include a target file, or remove the
'file-contains'variant from the union and the switch until it is properly implemented.🐛 Proposed fix: remove the dead variant for now
export interface ExecutableAC { original: string; - type: 'command' | 'file-exists' | 'file-contains'; + type: 'command' | 'file-exists'; assertion: string; }export function acToVerificationCommands(acs: ExecutableAC[]): string[] { return acs.map(ac => { switch (ac.type) { case 'command': return ac.assertion; case 'file-exists': return `test -e '${shellEscape(ac.assertion)}'`; - case 'file-contains': - return `grep -q '${shellEscape(ac.assertion)}' || true`; // soft check default: return ''; } }).filter(Boolean); }Also applies to: 69-82
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/ac-validator.ts` around lines 7 - 11, The AC variant 'file-contains' is unreachable and its grep invocation lacks a file argument; remove this dead variant and its handling until implemented: update the ExecutableAC union to drop 'file-contains', remove the 'file-contains' case in acToVerificationCommands, and clean up any JSDoc/comments in parseExecutableCriteria referencing "X contains Y"; ensure parseExecutableCriteria only emits the remaining types (e.g., 'file-exists' and 'command') so acToVerificationCommands and tests remain consistent with ExecutableAC.src/config/types.ts-411-412 (1)
411-412:⚠️ Potential issue | 🟡 Minor
autoCommitdocumented asdefault: truebut absent fromDEFAULT_CONFIGThe updated JSDoc on line 411 states
(default: true), butDEFAULT_CONFIG(line 532) has noautoCommitfield. Consumers that derive defaults fromDEFAULT_CONFIGwill receiveundefined, nottrue. Either addautoCommit: truetoDEFAULT_CONFIG, or remove the default claim from the comment and document it in the consuming engine code where the actual fallback lives.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/types.ts` around lines 411 - 412, The JSDoc for the autoCommit field claims a default of true but DEFAULT_CONFIG does not include autoCommit, so consumers using DEFAULT_CONFIG get undefined; fix by adding autoCommit: true to DEFAULT_CONFIG (in the same module where DEFAULT_CONFIG is defined) or remove/adjust the JSDoc default and instead document the fallback where the consuming engine applies the true default; reference the autoCommit property and DEFAULT_CONFIG constant to locate and update the code.src/engine/ac-validator.ts-56-59 (1)
56-59:⚠️ Potential issue | 🟡 Minor
looksLikeCommand:s.includes(' run ')is overly broadThe substring
runis common in plain English acceptance-criteria text (e.g. "the tests should run without errors", "make sure CI can run the suite"). This risks misclassifying non-executable criteria as shell commands.🐛 Proposed fix
function looksLikeCommand(s: string): boolean { const cmdPrefixes = ['bun ', 'npm ', 'npx ', 'node ', 'git ', 'curl ', 'test ']; - return cmdPrefixes.some(p => s.startsWith(p)) || s.includes(' run '); + return cmdPrefixes.some(p => s.startsWith(p)); }If you want to keep
run, tighten it to patterns likebun runornpm runwhich are already covered by the prefix list anyway.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/ac-validator.ts` around lines 56 - 59, The looksLikeCommand function is misclassifying plain text because of the broad s.includes(' run ') check; update it to remove that generic check and instead match only explicit package-run patterns (e.g., "npm run", "bun run", "yarn run", "pnpm run") or rely on the existing cmdPrefixes array; modify looksLikeCommand to drop s.includes(' run ') and, if needed, replace it with a targeted pattern check for those specific run commands so ordinary phrases like "tests should run" are not treated as shell commands.src/engine/diff-summarizer.ts-29-35 (1)
29-35:⚠️ Potential issue | 🟡 MinorRenamed and copied files produce malformed paths in the diff summary
In the short-format (and porcelain v1), the
→ PATH2part is shown when a file corresponds to a different path, i.e. the file is renamed. Concretely, a staged rename emitsR old.ts -> new.ts;line.substring(3)therefore yields"old.ts -> new.ts", which is neither a valid path nor either of the actual filenames. That string then lands infilesChangedand is injected verbatim into agent prompts.In the
-zformat the->separator is omitted and the field order is reversed, which makes it machine-friendly. Switching to--porcelain -z(NUL-delimited) and splitting on\0is the recommended approach for machine parsing, or at minimum detect and strip the-> …suffix forR/Centries:🐛 Proposed minimal fix for rename/copy entries
for (const line of lines) { const status = line.substring(0, 2).trim(); - const file = line.substring(3); + // Renames/copies: "R old.ts -> new.ts" — keep only the destination path + const rawFile = line.substring(3); + const file = rawFile.includes(' -> ') ? rawFile.split(' -> ').pop()! : rawFile; if (status === 'A' || status === '??') filesAdded.push(file); else if (status === 'D') filesDeleted.push(file); else filesChanged.push(file); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/diff-summarizer.ts` around lines 29 - 35, The current loop in diff-summarizer (iterating over lines, using status = line.substring(0,2) and file = line.substring(3)) mishandles rename/copy entries (e.g. "R old -> new") producing malformed paths; update the parsing so that when status indicates rename/copy (status startsWith 'R' or 'C' or the file string contains "->") you extract the destination path (the text after "->") and trim it before pushing into filesAdded/filesChanged/filesDeleted, and ideally switch the upstream parsing to use git --porcelain -z and split on '\0' to avoid ambiguous separators — adjust references to variables lines, status, file, filesAdded, filesDeleted, filesChanged and the loop that builds them.src/tui/components/RunApp.tsx-1815-1818 (1)
1815-1818:⚠️ Potential issue | 🟡 Minor
cost:threshold-exceededprovides no distinct user notification.The handler updates the cost display but relies entirely on the subsequent
engine:pausedevent to communicate the pause to the user. There is no toast or banner indicating why execution stopped, which could confuse users who did not configure the threshold themselves (e.g., on a shared config).💡 Suggested improvement — surface a threshold toast
case 'cost:threshold-exceeded': // Engine will auto-pause; update cost display setTotalCost(event.snapshot.totalCost); + setInfoFeedback(`Cost threshold reached (${event.snapshot.totalCost.toFixed(4)}). Execution paused.`); break;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tui/components/RunApp.tsx` around lines 1815 - 1818, The 'cost:threshold-exceeded' handler in RunApp.tsx only calls setTotalCost and doesn't notify the user why execution stopped; update the case for 'cost:threshold-exceeded' to also surface a user-facing notification (e.g., call the existing toast/enqueueSnackbar/notify function used elsewhere in RunApp) with a clear message that execution was paused due to a cost threshold and include the current total (event.snapshot.totalCost); keep the cost update but add the toast/banner call so users immediately see why the run paused before the separate 'engine:paused' event arrives.spec/20260222-engine-improvements/01-verification-gates/TODO.md-1-21 (1)
1-21:⚠️ Potential issue | 🟡 MinorStale planning document — same issue as the Step 2 TODO.
Status shows "⬜ Not Started" with all tasks unchecked, while the verification gate implementation (verification.ts, event types, engine wiring, and full test suite) is demonstrably complete. Update or remove this file to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/01-verification-gates/TODO.md` around lines 1 - 21, The TODO planning doc is stale and conflicts with the implemented verification work; update or remove spec/20260222-engine-improvements/01-verification-gates/TODO.md to reflect the current state by either deleting the file or marking tasks done and adding notes linking to the real implementations (e.g., VerificationConfig and DEFAULT_VERIFICATION_CONFIG in src/config/types.ts, verification?: VerificationConfig on StoredConfig/RalphConfig, runVerification() and formatVerificationErrors() in src/engine/verification.ts, VerificationStartedEvent/VerificationPassedEvent/VerificationFailedEvent in src/engine/types.ts, engine integration in src/engine/index.ts, verificationErrors in TemplateVariables in src/templates/types.ts, and template changes in src/plugins/trackers/builtin/json/template.hbs) so the checklist and Status field accurately represent completed work.tests/engine/ac-validator.test.ts-73-92 (1)
73-92:⚠️ Potential issue | 🟡 MinorNo test coverage for the
file-containscase inacToVerificationCommands.
acToVerificationCommandshas afile-containsbranch (grep -q '…' || true), butparseExecutableCriterianever produces that type — making the branch unreachable via the normal flow. The test suite has no test for this case, and|| truemeans it would always succeed even if the content is absent, which silently defeats the verification intent.Consider either removing the dead branch from
acToVerificationCommandsor, iffile-containsis intended for future use, adding an explicit test and removing the|| truesafety valve before it becomes active.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/engine/ac-validator.test.ts` around lines 73 - 92, acToVerificationCommands contains a dead "file-contains" branch (the grep -q '…' || true form) that parseExecutableCriteria never produces; either remove that branch to avoid misleading behavior or, if you intend to support it, update parseExecutableCriteria to emit type: 'file-contains' and add a unit test in ac-validator.test.ts covering the expected output (e.g., assertion path and content => "grep -q '…' <file>" without the "|| true" safety) and remove the "|| true" so failures surface; refer to acToVerificationCommands and parseExecutableCriteria when making the change and add/modify the test similar to the existing cases in ac-validator.test.ts.spec/20260222-engine-improvements/02-auto-commit-defaults/TODO.md-1-21 (1)
1-21:⚠️ Potential issue | 🟡 MinorStale planning document — status and task boxes are out of date.
This file records "⬜ Not Started" with all tasks unchecked, yet the corresponding
COMPLETED.mddocuments the full implementation. Leaving the TODO in this state alongside a COMPLETED counterpart is misleading for future contributors. Consider either:
- Updating the status and ticking all boxes to reflect completion, or
- Deleting the file, since
COMPLETED.mdis the canonical record.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/02-auto-commit-defaults/TODO.md` around lines 1 - 21, The TODO.md planning document is stale (still shows "⬜ Not Started") while the COMPLETED.md records the work as done; either update this TODO to reflect completion by changing the status to "✅ Completed" and checking off all tasks (or adding a brief "moved to COMPLETED.md" note), or remove the TODO.md entirely so only the canonical COMPLETED.md remains; make this change in the spec/20260222-engine-improvements/02-auto-commit-defaults/TODO.md file.spec/20260222-engine-improvements/02-auto-commit-defaults/COMPLETED.md-30-33 (1)
30-33:⚠️ Potential issue | 🟡 MinorMissing language specifier on fenced code block (markdownlint MD040).
Add a language tag to suppress the linter warning:
✏️ Proposed fix
-``` +```text bun run typecheck PASS🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/02-auto-commit-defaults/COMPLETED.md` around lines 30 - 33, The fenced code block in COMPLETED.md is missing a language specifier (triggering markdownlint MD040); update the triple-backtick that opens the block to include a language tag (e.g., change ``` to ```text or ```bash) so the block reads like "```text" (or your preferred language) to satisfy the linter and preserve the existing content (the block containing "bun run typecheck PASS", "bun run build PASS", "bun test PASS").spec/20260222-engine-improvements/06-acceptance-criteria-validation/COMPLETED.md-39-42 (1)
39-42:⚠️ Potential issue | 🟡 MinorMissing language specifier on fenced code block (markdownlint MD040).
✏️ Proposed fix
-``` +```text bun run typecheck ✓ (clean)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/06-acceptance-criteria-validation/COMPLETED.md` around lines 39 - 42, The fenced code block containing the lines "bun run typecheck ✓ (clean)" / "bun run build ✓ (clean)" / "bun test tests/engine/ac-validator.test.ts ✓ (17 pass, 0 fail)" is missing a language specifier; update the opening fence from ``` to ```text so the block becomes ```text and includes the three lines, which satisfies markdownlint MD040.src/config/index.ts-704-704 (1)
704-704:⚠️ Potential issue | 🟡 Minor
autoCommit: truedefault is a silent breaking change for existing users.Before this PR, unset
autoCommitmeant no auto-commits. After this PR, every existing deployment without an explicitautoCommit = falsein their config will start auto-committing after upgrading. This should be prominently highlighted in the changelog/release notes, and a--no-auto-commitdeprecation notice or migration guide should accompany the change.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/index.ts` at line 704, The change made autoCommit default to true (autoCommit: options.autoCommit ?? storedConfig.autoCommit ?? true) which silently enables auto-commits for existing users; revert to the prior implicit "off" behavior by changing the default to false (e.g., use options.autoCommit ?? storedConfig.autoCommit ?? false) and add a clear deprecation/migration path: log a one-time warning when autoCommit is undefined recommending explicit opt-in/opt-out, add a CLI/flag handler for --no-auto-commit/--auto-commit to make opt-in explicit, and document the change in the release notes/changelog. Ensure you update the code that reads autoCommit (the autoCommit assignment using options and storedConfig) and add the warning where configuration is initialized.tests/engine/diff-summarizer.test.ts-90-142 (1)
90-142:⚠️ Potential issue | 🟡 MinorMissing test coverage for the
filesDeletedpath.
generateDiffSummaryLocalhas an explicitelse if (status === 'D') filesDeleted.push(file)branch (line 54), andDiffSummaryexposesfilesDeletedas a first-class field. There is currently no test that:
- verifies a deleted tracked file populates
result.filesDeleted- verifies the
"Deleted:"prefix appears inresult.summary✅ Suggested test additions
+ test('deleted tracked files populate filesDeleted', async () => { + // Remove a file that was committed in initGitRepo + const { rm } = await import('node:fs/promises'); + await rm(join(tempDir, 'README.md')); + + const result = await generateDiffSummaryLocal(tempDir); + expect(result).not.toBeNull(); + expect(result!.filesDeleted).toContain('README.md'); + expect(result!.filesAdded).toHaveLength(0); + expect(result!.filesChanged).toHaveLength(0); + }); + + test('summary contains Deleted prefix for removed files', async () => { + const { rm } = await import('node:fs/promises'); + await rm(join(tempDir, 'README.md')); + + const result = await generateDiffSummaryLocal(tempDir); + expect(result).not.toBeNull(); + expect(result!.summary).toContain('Deleted:'); + expect(result!.summary).toContain('README.md'); + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/engine/diff-summarizer.test.ts` around lines 90 - 142, Add a test that removes a tracked file and asserts the deleted-path behavior: call unlink/remove on an already-committed file (e.g., 'README.md'), run generateDiffSummaryLocal(tempDir), then assert result is not null, result.filesDeleted contains 'README.md' and result.summary contains the "Deleted:" prefix; reference generateDiffSummaryLocal and the DiffSummary.filesDeleted field to locate where to add this test.spec/20260222-engine-improvements/05-completion-detection/TASK.md-61-69 (1)
61-69:⚠️ Potential issue | 🟡 MinorRemove dead code and clarify misleading comment in spec and implementation.
The spec file contains the dead code as identified, but the actual implementation in
src/engine/completion-strategies.ts(lines 27–34) has already been corrected and does not contain the unusedstrippedvariable. However, both the spec and the actual implementation retain a misleading comment claiming "Match even inside markdown code blocks" when neither implementation actually strips fence markers before testing.Spec (TASK.md lines 61–69):
- Remove the no-op
const stripped = ...line.- Replace the misleading comment with an accurate description: the strategy extends
promiseTagStrategywith an alternativepromise: completepattern but does not handle code fences.Actual implementation (src/engine/completion-strategies.ts line 30):
- Update the comment to clarify that the strategy does not detect tags inside markdown code fences, only the explicit tag or the informal pattern.
If stripping code fences is genuinely required, implement it correctly by testing against the stripped string instead of the original.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/05-completion-detection/TASK.md` around lines 61 - 69, Remove the dead no-op `const stripped = ...` line from the spec's `relaxedTagStrategy` and change the misleading comment "Match even inside markdown code blocks" to a truthful description that this strategy extends `promiseTagStrategy` by also matching the informal `promise: complete` pattern but does not strip markdown fences; also update the comment in the actual implementation's `relaxedTagStrategy` (same symbol) to the same clarified wording. If you instead want to support detection inside fenced code blocks, implement proper fence-stripping and run the regex tests against the stripped string (replace uses of `result.stdout` with the stripped variable) rather than leaving the current no-op.src/config/index.ts-705-712 (1)
705-712:⚠️ Potential issue | 🟡 MinorConsider applying
DEFAULT_MODEL_ESCALATIONas base in config construction for clarity.When
storedConfig.modelEscalationisundefinedand the user provides only--start-model, the constructed config object becomes{ enabled: true, startModel: 'x' }withescalateModelandescalateAfterabsent. While the engine appliesDEFAULT_MODEL_ESCALATIONas a base (at line 1006 insrc/engine/index.ts) before callinggetModelForTask, applying defaults at config construction would be more explicit and defensive.Additionally, the
?? undefinedpatterns on lines 712–713 are redundant when the value is alreadyundefined | T(the nullish coalescing does nothing).Suggested improvement
- modelEscalation: (options.startModel || options.escalateModel) - ? { - ...storedConfig.modelEscalation, - enabled: true, - ...(options.startModel ? { startModel: options.startModel } : {}), - ...(options.escalateModel ? { escalateModel: options.escalateModel } : {}), - } - : storedConfig.modelEscalation ?? undefined, + modelEscalation: (options.startModel || options.escalateModel) + ? { + ...DEFAULT_MODEL_ESCALATION, + ...storedConfig.modelEscalation, + enabled: true, + ...(options.startModel ? { startModel: options.startModel } : {}), + ...(options.escalateModel ? { escalateModel: options.escalateModel } : {}), + } + : storedConfig.modelEscalation,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/index.ts` around lines 705 - 712, The modelEscalation block should use DEFAULT_MODEL_ESCALATION as the base when constructing the config so partial CLI flags (like options.startModel) get the full default shape; change the logic that builds modelEscalation (symbol: modelEscalation) to merge DEFAULT_MODEL_ESCALATION, then overlay storedConfig.modelEscalation and finally overlay flags (options.startModel / options.escalateModel) so escalateModel and escalateAfter are present when appropriate; also remove the redundant "?? undefined" nullish coalescing on the fallback to storedConfig.modelEscalation since it's already undefined | T.spec/20260222-engine-improvements/04-cross-iteration-context/TODO.md-1-3 (1)
1-3:⚠️ Potential issue | 🟡 MinorTODO status shows "Not Started" but Step 4 is marked complete in PLAN.md.
The PLAN.md completion log (line 151) records Step 4 as completed on 2026-02-22, and the AI summary confirms the implementation exists. This TODO file appears to have been left un-updated. Consider updating the status or removing this file if a separate
COMPLETED.mdalready exists.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/04-cross-iteration-context/TODO.md` around lines 1 - 3, Update the TODO file "Step 4: Cross-Iteration Context - Todo" so its status matches PLAN.md (Step 4 marked complete on 2026-02-22): either change the "## Status:" line from "⬜ Not Started" to a completed state with the completion date, or remove this TODO.md if a separate COMPLETED.md already records the finished work; ensure the header "Step 4: Cross-Iteration Context - Todo" and status line are updated to keep documentation consistent with the PLAN.md entry.spec/20260222-engine-improvements/06-acceptance-criteria-validation/TASK.md-58-70 (1)
58-70:⚠️ Potential issue | 🟡 MinorBacktick/single-quote regex may cause false-positive command extraction
The regex
/[`']([^`']+)[`']/matches content between mismatched delimiters (e.g.,`foo'or'foo`). This is likely intentional for flexibility with agent output variations, but could extract non-command text from ACs that use single quotes for emphasis (e.g.,The 'login' page should work), feeding it tolooksLikeCommand. ThelooksLikeCommandguard mitigates this somewhat but phrases like'node modules should be installed'would match the'node 'prefix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/06-acceptance-criteria-validation/TASK.md` around lines 58 - 70, The current regex /[`']([^`']+)[`']/ can match mismatched delimiters and produce false-positive command captures; update parseExecutableCriteria so you only capture content enclosed by the same delimiter (prefer backticks for shell commands) — e.g., replace the single combined pattern with a strict backtick-only match (/`([^`]+)`/) and optionally a separate single-quote match (/\'([^']+)\'/) that is only used if looksLikeCommand returns true; adjust the cmdMatch handling in parseExecutableCriteria (the cmdMatch variable and the branch that pushes type:'command') accordingly so you no longer extract text from mismatched delimiters.spec/20260222-engine-improvements/07-cost-tracking/TASK.md-93-95 (1)
93-95:⚠️ Potential issue | 🟡 MinorShallow spread does not deep-copy
iterationCostsarray
getSnapshot()returns{ ...this.snapshot }, which copies primitive fields but shares theiterationCostsarray reference. A caller doingsnapshot.iterationCosts.push(…)would mutate the tracker's internal state. The spec should note that a proper copy (e.g., spreading the array) is needed.getSnapshot(): CostSnapshot { - return { ...this.snapshot }; + return { ...this.snapshot, iterationCosts: [...this.snapshot.iterationCosts] }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/07-cost-tracking/TASK.md` around lines 93 - 95, getSnapshot() currently returns a shallow copy ({ ...this.snapshot }) which preserves the same iterationCosts array reference; change it to return a copy that clones the iterationCosts array as well (e.g., return { ...this.snapshot, iterationCosts: [...this.snapshot.iterationCosts] }) so callers cannot mutate the tracker's internal state via snapshot.iterationCosts; if iterationCosts contains mutable objects, also deep-copy those elements as needed to fully isolate the snapshot from internal state.src/commands/run.tsx-534-543 (1)
534-543:⚠️ Potential issue | 🟡 Minor
i++placement inconsistent with other numeric flag parsers — value arg may be re-parsed as a flagWhen
nextArgis present but fails theparseIntvalidation (e.g.,--conflict-timeout abc),iis not incremented, so"abc"is re-processed as a flag name in the next loop iteration. Compare with--iterations(line 378) wherei++is outside the innerif, correctly consuming the value argument regardless of parse success. The same inconsistency exists for--listen-port.Proposed fix
case '--conflict-timeout': { if (nextArg && !nextArg.startsWith('-')) { const parsed = parseInt(nextArg, 10); if (!isNaN(parsed) && parsed > 0) { options.conflictTimeout = parsed; - i++; } + i++; } break; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/commands/run.tsx` around lines 534 - 543, The --conflict-timeout case currently only increments i when parseInt succeeds, causing the provided value token (e.g., "abc") to be re-parsed as a flag; change the logic in the case '--conflict-timeout' to always increment i whenever nextArg exists and does not start with '-' (like the --iterations case) but only assign options.conflictTimeout when parsed value is a valid number > 0; apply the same fix pattern to the --listen-port case so both flags consume their value token even if validation fails.spec/20260222-engine-improvements/PLAN.md-44-57 (1)
44-57:⚠️ Potential issue | 🟡 MinorDependency graph has Steps 4 and 6 swapped.
The graph labels Step 4 as "(AC validation)" and Step 6 as "(cross-iteration context)", but the table below (lines 61–70) correctly shows Step 4 = Cross-iteration context and Step 6 = Acceptance criteria validation. The arrow
Step 1 → Step 4should readStep 1 → Step 6, and the labels for 4 and 6 need swapping.Proposed fix
-Step 1 (verification) ─► Step 4 (AC validation) +Step 1 (verification) ─► Step 6 (AC validation) │ Step 2 (auto-commit) ──── independent │ Step 3 (model escalation) ─► Step 7 (cost tracking) │ -Step 5 (completion detection) ── independent +Step 4 (cross-iteration context) ── independent │ -Step 6 (cross-iteration context) ── independent +Step 5 (completion detection) ── independent │ Step 8 (parallel first-class) ── independent🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/PLAN.md` around lines 44 - 57, The dependency graph mislabels Step 4 and Step 6 and the edge from Step 1 points to the wrong step; update the diagram so Step 4 is labeled "cross-iteration context" and Step 6 is labeled "AC validation" (swapping the two labels), and change the arrow "Step 1 (verification) ─► Step 4 (AC validation)" to "Step 1 (verification) ─► Step 6 (AC validation)"; ensure any other references in the same diagram to Step 4/6 reflect this swap so the diagram matches the table below.spec/20260222-engine-improvements/07-cost-tracking/TASK.md-101-108 (1)
101-108:⚠️ Potential issue | 🟡 MinorModel matching with
includes()may produce incorrect results for some model namesThe
getPricingmethod usesmodel.includes(k)as a fallback matcher. This is too permissive — for example, a model name such as"opus-wrapper"would match the short key'opus'and receive incorrect pricing. Whilst the currentMODEL_PRICINGkeys are unlikely to cause false matches in typical Claude model names, the logic lacks precision.Consider instead: (1) enforce
startsWith()only and removeincludes(), relying on full versioned keys like'claude-opus-4-6'for prefix matching, or (2) sort keys from longest to shortest before matching to prioritise more-specific entries.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/20260222-engine-improvements/07-cost-tracking/TASK.md` around lines 101 - 108, The getPricing method currently uses model.includes(k) which is too permissive and can produce false matches; update getPricing (referencing the getPricing function, MODEL_PRICING and the key variable) to remove the includes() fallback and instead either (A) only match exact equality or prefix with startsWith(k) (relying on full versioned keys in MODEL_PRICING), or (B) sort Object.keys(MODEL_PRICING) by descending key length first and then find the first key where model === k || model.startsWith(k) so more specific keys win; implement one of these options and return MODEL_PRICING['sonnet'] as the safe default when no key is found.
| export function acToVerificationCommands(acs: ExecutableAC[], cwd: string): string[] { | ||
| return acs.map(ac => { | ||
| switch (ac.type) { | ||
| case 'command': | ||
| return ac.assertion; | ||
| case 'file-exists': | ||
| return `test -e "${ac.assertion}"`; | ||
| case 'file-contains': | ||
| return `grep -q "${ac.assertion}" || true`; // soft check | ||
| default: | ||
| return ''; | ||
| } | ||
| }).filter(Boolean); | ||
| } |
There was a problem hiding this comment.
Command injection risk: user-derived strings interpolated into shell commands without escaping
ac.assertion values are extracted from user-authored acceptance criteria via regex and placed directly into shell command strings with double-quote interpolation (lines 106, 108). A crafted AC like Running `bun test"; rm -rf /` passes would produce a dangerous verification command. The AI summary mentions a shellEscape internal helper — ensure the implementation applies it to all interpolated values.
Additionally, the file-contains case (line 108) generates grep -q "${ac.assertion}" || true without a target file path, which would read from stdin and hang.
Finally, the cwd parameter is declared but never used in the function body.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@spec/20260222-engine-improvements/06-acceptance-criteria-validation/TASK.md`
around lines 100 - 113, The acToVerificationCommands function is vulnerable
because ac.assertion and cwd are interpolated into shell strings without
escaping and the 'file-contains' branch emits a grep that reads from stdin;
update acToVerificationCommands to call the internal shellEscape helper on every
interpolated value (use shellEscape(ac.assertion) for file paths/patterns and
shellEscape(cwd) when prepending a working-directory change), change the
'file-exists' case to use test -e "shellEscape(ac.assertion)", change the
'file-contains' case to run grep safely (e.g. grep -qr --
"shellEscape(ac.assertion)" . || true to search under the given cwd) and prefix
each returned command with cd "shellEscape(cwd)" && ... so cwd is actually used;
keep 'command' behavior but ensure you only interpolate cwd (not raw
ac.assertion) into the wrapper command so user commands are not concatenated
into unescaped shell contexts.
src/engine/cost-tracker.ts
Outdated
| private getPricing(model?: string): ModelPricing { | ||
| if (!model) return MODEL_PRICING['sonnet']; // safe default | ||
| // Try exact match, then prefix match | ||
| const key = Object.keys(MODEL_PRICING).find( | ||
| k => model === k || model.startsWith(k) || model.includes(k) | ||
| ); | ||
| return key ? MODEL_PRICING[key] : MODEL_PRICING['sonnet']; | ||
| } |
There was a problem hiding this comment.
Version-specific pricing keys are unreachable dead code, and the matching strategy is fragile.
Object.keys iterates in insertion order. Because 'opus' precedes 'claude-opus-4-6' (and likewise for sonnet/haiku), the condition model.includes('opus') is satisfied for the short key before the version-specific key is ever checked. This makes every 'claude-*' key permanently unreachable — if they were ever assigned different pricing, it would silently be ignored.
Additionally, model.includes(k) performs unanchored substring matching: any model whose name happens to contain a tier word (e.g. a hypothetical "new-opus-lite") would silently receive the wrong price tier.
A safer approach separates match phases:
♻️ Proposed fix — prioritised, non-ambiguous match
- private getPricing(model?: string): ModelPricing {
- if (!model) return MODEL_PRICING['sonnet']; // safe default
- // Try exact match, then prefix match
- const key = Object.keys(MODEL_PRICING).find(
- k => model === k || model.startsWith(k) || model.includes(k)
- );
- return key ? MODEL_PRICING[key] : MODEL_PRICING['sonnet'];
- }
+ private getPricing(model?: string): ModelPricing {
+ if (!model) return MODEL_PRICING['sonnet'];
+ // 1. Exact match
+ if (MODEL_PRICING[model]) return MODEL_PRICING[model];
+ // 2. Prefix match (model name starts with a known key)
+ const prefixKey = Object.keys(MODEL_PRICING).find(k => model.startsWith(k));
+ if (prefixKey) return MODEL_PRICING[prefixKey];
+ // 3. Substring match (model name contains a known key)
+ const subKey = Object.keys(MODEL_PRICING).find(k => model.includes(k));
+ return subKey ? MODEL_PRICING[subKey] : MODEL_PRICING['sonnet'];
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private getPricing(model?: string): ModelPricing { | |
| if (!model) return MODEL_PRICING['sonnet']; // safe default | |
| // Try exact match, then prefix match | |
| const key = Object.keys(MODEL_PRICING).find( | |
| k => model === k || model.startsWith(k) || model.includes(k) | |
| ); | |
| return key ? MODEL_PRICING[key] : MODEL_PRICING['sonnet']; | |
| } | |
| private getPricing(model?: string): ModelPricing { | |
| if (!model) return MODEL_PRICING['sonnet']; | |
| // 1. Exact match | |
| if (MODEL_PRICING[model]) return MODEL_PRICING[model]; | |
| // 2. Prefix match (model name starts with a known key) | |
| const prefixKey = Object.keys(MODEL_PRICING).find(k => model.startsWith(k)); | |
| if (prefixKey) return MODEL_PRICING[prefixKey]; | |
| // 3. Substring match (model name contains a known key) | |
| const subKey = Object.keys(MODEL_PRICING).find(k => model.includes(k)); | |
| return subKey ? MODEL_PRICING[subKey] : MODEL_PRICING['sonnet']; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/engine/cost-tracker.ts` around lines 64 - 71, getPricing currently uses
unanchored includes and insertion-order key scanning so short keys like "opus"
shadow versioned keys like "claude-opus-4-6"; fix by changing the matching
strategy in getPricing: first check for exact match of model in MODEL_PRICING,
then perform a prefix match using keys sorted by descending length (so more
specific/versioned keys like "claude-opus-4-6" are matched before "opus"), and
do not use model.includes(k); fall back to MODEL_PRICING['sonnet'] if no match
is found. Ensure you reference getPricing and MODEL_PRICING when implementing
this change.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/engine/index.ts (1)
1363-1374:promiseCompleteis misleading now that completion uses pluggable strategies.With strategies like
'relaxed-tag'or'heuristic', the variable no longer specifically reflects a<promise>COMPLETE</promise>signal. Renaming tocompletionSignalled(oragentSignalledComplete) would keep the semantics accurate as more strategies are added.♻️ Suggested rename
- const completionResult = detectCompletion( - agentResult, - this.config.completion?.strategies ?? ['promise-tag'], - ); - const promiseComplete = completionResult.completed; + const completionResult = detectCompletion( + agentResult, + this.config.completion?.strategies ?? ['promise-tag'], + ); + const completionSignalled = completionResult.completed; // ... - let taskCompleted = promiseComplete; + let taskCompleted = completionSignalled; // ... - } else if (!promiseComplete) { + } else if (!completionSignalled) { // ... promiseComplete, // in IterationResult + // rename field to completionSignalled in IterationResult if types permit🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/index.ts` around lines 1363 - 1374, The variable name promiseComplete is misleading now that detectCompletion supports pluggable strategies; rename promiseComplete to completionSignalled (or agentSignalledComplete) and update the assignment and any uses (e.g., let completionSignalled = completionResult.completed) and the following taskCompleted initialization (let taskCompleted = completionSignalled) as well as the surrounding comment to reflect "completion signalled by strategy" rather than an explicit <promise>COMPLETE</promise>; update references to detectCompletion, completionResult, and this.config.completion?.strategies accordingly.src/commands/run.tsx (1)
697-706: Default parallel mode changed to'auto'— existing projects without an explicitparallel.modewill now receive dependency-graph-based parallel scheduling onralph-tui run.Previously the fallback was sequential. Any project relying on implicit sequential behaviour will silently begin running tasks in parallel (when the graph analysis recommends it). Consider adding a release note or migration warning, or defaulting only for new projects that have never stored a config.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/commands/run.tsx` around lines 697 - 706, The resolveParallelMode function currently defaults to 'auto' even for projects with an existing storedConfig missing a parallel.mode; change it so 'auto' is returned only when there is no storedConfig at all (new projects), and when storedConfig exists but has no parallel.mode fall back to the previous sequential default ('never'). Update the final return to check storedConfig explicitly (e.g., if storedConfig == null return 'auto'; else return storedConfig.parallel?.mode ?? 'never'), keeping the CLI flag checks (options.serial/options.parallel) intact; references: resolveParallelMode, options.serial, options.parallel, storedConfig.parallel.mode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/commands/run.tsx`:
- Around line 548-557: The --conflict-timeout branch fails to consume the next
token when parsing fails, leaving the invalid value as the next arg; update the
handler in run.tsx so that when nextArg exists and !nextArg.startsWith('-') you
always increment i (consume the token) regardless of whether parseInt(nextArg,
10) yields a valid positive number; keep assigning options.conflictTimeout only
on successful parse (using parseInt and the parsed > 0 check) but move or add
i++ so the token is consumed in both success and failure paths, matching the
behavior of the other integer flags (variables: i, nextArg, parseInt,
options.conflictTimeout).
In `@src/engine/index.ts`:
- Around line 1019-1022: Remove the redundant console output that corrupts the
TUI: in the escalation path where this.emit(event) is called (the model
escalation logic referencing previousModel, escalatedModel and attempts), delete
the console.log(`[model-escalation] Escalating model: ${previousModel} →
${escalatedModel} after ${attempts} failed attempt(s)`); line so only the
"model:escalated" event is emitted and no direct stdout writes occur.
- Around line 1553-1563: The cost threshold check currently fires every
iteration because snapshot.totalCost is monotonic; add a private boolean flag
(e.g., costThresholdTripped) to the class and change the logic around the
alertThreshold check in the method that contains
this.config.cost?.alertThreshold, snapshot.totalCost, this.emit and this.pause
so that you only emit `cost:threshold-exceeded` and call this.pause() when
alertThreshold > 0 && !this.costThresholdTripped && snapshot.totalCost >=
alertThreshold, and then set this.costThresholdTripped = true after
emitting/pausing (optionally provide a method to clear/reset the flag if you
ever allow decreasing cost or manual override).
---
Nitpick comments:
In `@src/commands/run.tsx`:
- Around line 697-706: The resolveParallelMode function currently defaults to
'auto' even for projects with an existing storedConfig missing a parallel.mode;
change it so 'auto' is returned only when there is no storedConfig at all (new
projects), and when storedConfig exists but has no parallel.mode fall back to
the previous sequential default ('never'). Update the final return to check
storedConfig explicitly (e.g., if storedConfig == null return 'auto'; else
return storedConfig.parallel?.mode ?? 'never'), keeping the CLI flag checks
(options.serial/options.parallel) intact; references: resolveParallelMode,
options.serial, options.parallel, storedConfig.parallel.mode.
In `@src/engine/index.ts`:
- Around line 1363-1374: The variable name promiseComplete is misleading now
that detectCompletion supports pluggable strategies; rename promiseComplete to
completionSignalled (or agentSignalledComplete) and update the assignment and
any uses (e.g., let completionSignalled = completionResult.completed) and the
following taskCompleted initialization (let taskCompleted = completionSignalled)
as well as the surrounding comment to reflect "completion signalled by strategy"
rather than an explicit <promise>COMPLETE</promise>; update references to
detectCompletion, completionResult, and this.config.completion?.strategies
accordingly.
| case '--conflict-timeout': { | ||
| if (nextArg && !nextArg.startsWith('-')) { | ||
| const parsed = parseInt(nextArg, 10); | ||
| if (!isNaN(parsed) && parsed > 0) { | ||
| options.conflictTimeout = parsed; | ||
| i++; | ||
| } | ||
| } | ||
| break; | ||
| } |
There was a problem hiding this comment.
--conflict-timeout does not consume nextArg on parse failure, inconsistent with other parsers.
When an invalid value is provided (e.g., --conflict-timeout abc), i is not incremented so abc becomes the next arg in the loop. For all other integer flags (--iterations, --delay, --listen-port) the outer if block always ends with i++, consuming the token regardless of parse validity.
🐛 Proposed fix
case '--conflict-timeout': {
if (nextArg && !nextArg.startsWith('-')) {
const parsed = parseInt(nextArg, 10);
if (!isNaN(parsed) && parsed > 0) {
options.conflictTimeout = parsed;
- i++;
}
+ i++;
}
break;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case '--conflict-timeout': { | |
| if (nextArg && !nextArg.startsWith('-')) { | |
| const parsed = parseInt(nextArg, 10); | |
| if (!isNaN(parsed) && parsed > 0) { | |
| options.conflictTimeout = parsed; | |
| i++; | |
| } | |
| } | |
| break; | |
| } | |
| case '--conflict-timeout': { | |
| if (nextArg && !nextArg.startsWith('-')) { | |
| const parsed = parseInt(nextArg, 10); | |
| if (!isNaN(parsed) && parsed > 0) { | |
| options.conflictTimeout = parsed; | |
| } | |
| i++; | |
| } | |
| break; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/commands/run.tsx` around lines 548 - 557, The --conflict-timeout branch
fails to consume the next token when parsing fails, leaving the invalid value as
the next arg; update the handler in run.tsx so that when nextArg exists and
!nextArg.startsWith('-') you always increment i (consume the token) regardless
of whether parseInt(nextArg, 10) yields a valid positive number; keep assigning
options.conflictTimeout only on successful parse (using parseInt and the parsed
> 0 check) but move or add i++ so the token is consumed in both success and
failure paths, matching the behavior of the other integer flags (variables: i,
nextArg, parseInt, options.conflictTimeout).
| failedAttempts: attempts, | ||
| }; | ||
| this.emit(event); | ||
| console.log(`[model-escalation] Escalating model: ${previousModel} → ${escalatedModel} after ${attempts} failed attempt(s)`); |
There was a problem hiding this comment.
console.log in TUI mode will corrupt the display; the model:escalated event already surfaces this information.
The model:escalated event is emitted at Line 1021 — the console.log at Line 1022 is redundant and writes to the same stdout stream as the TUI renderer, causing visual corruption whenever escalation fires in interactive mode. The existing console.log calls for rate-limit (Line 924) carry the same risk but are pre-existing; avoid adding more.
🐛 Proposed fix — remove the redundant log
if (attempts >= escalationConfig.escalateAfter && previousModel) {
const event: ModelEscalatedEvent = {
type: 'model:escalated',
timestamp: new Date().toISOString(),
taskId: task.id,
previousModel,
newModel: escalatedModel,
failedAttempts: attempts,
};
this.emit(event);
- console.log(`[model-escalation] Escalating model: ${previousModel} → ${escalatedModel} after ${attempts} failed attempt(s)`);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| failedAttempts: attempts, | |
| }; | |
| this.emit(event); | |
| console.log(`[model-escalation] Escalating model: ${previousModel} → ${escalatedModel} after ${attempts} failed attempt(s)`); | |
| failedAttempts: attempts, | |
| }; | |
| this.emit(event); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/engine/index.ts` around lines 1019 - 1022, Remove the redundant console
output that corrupts the TUI: in the escalation path where this.emit(event) is
called (the model escalation logic referencing previousModel, escalatedModel and
attempts), delete the console.log(`[model-escalation] Escalating model:
${previousModel} → ${escalatedModel} after ${attempts} failed attempt(s)`); line
so only the "model:escalated" event is emitted and no direct stdout writes
occur.
| const alertThreshold = this.config.cost?.alertThreshold ?? 0; | ||
| if (alertThreshold > 0 && snapshot.totalCost >= alertThreshold) { | ||
| this.emit({ | ||
| type: 'cost:threshold-exceeded', | ||
| timestamp: endedAt.toISOString(), | ||
| snapshot, | ||
| threshold: alertThreshold, | ||
| }); | ||
| // Pause the engine so the user can decide whether to continue | ||
| this.pause(); | ||
| } |
There was a problem hiding this comment.
Cost threshold fires on every iteration once crossed, creating an infinite pause loop.
snapshot.totalCost is monotonically increasing. Once totalCost >= alertThreshold, every completed iteration will emit cost:threshold-exceeded and call this.pause(). The user resumes, the next iteration completes, the cost is still above the threshold, and the engine pauses again immediately.
A threshold crossing guard is needed — only pause when the threshold is first exceeded:
🐛 Proposed fix — track whether the threshold has already been tripped
Add a private flag to the class:
private costThresholdTripped = false;Then guard the pause:
const alertThreshold = this.config.cost?.alertThreshold ?? 0;
- if (alertThreshold > 0 && snapshot.totalCost >= alertThreshold) {
+ if (!this.costThresholdTripped && alertThreshold > 0 && snapshot.totalCost >= alertThreshold) {
+ this.costThresholdTripped = true;
this.emit({
type: 'cost:threshold-exceeded',
timestamp: endedAt.toISOString(),
snapshot,
threshold: alertThreshold,
});
// Pause the engine so the user can decide whether to continue
this.pause();
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/engine/index.ts` around lines 1553 - 1563, The cost threshold check
currently fires every iteration because snapshot.totalCost is monotonic; add a
private boolean flag (e.g., costThresholdTripped) to the class and change the
logic around the alertThreshold check in the method that contains
this.config.cost?.alertThreshold, snapshot.totalCost, this.emit and this.pause
so that you only emit `cost:threshold-exceeded` and call this.pause() when
alertThreshold > 0 && !this.costThresholdTripped && snapshot.totalCost >=
alertThreshold, and then set this.costThresholdTripped = true after
emitting/pausing (optionally provide a method to clear/reset the flag if you
ever allow decreasing cost or manual override).
|
@highimpact-dev Thanks for the PR - the feature set here is solid and the modular approach with per-feature tests is well done. I have two concerns before this can merge. First, the hardcoded model pricing table in cost-tracker.ts is already stale (e.g., Opus listed at $15/$75 vs current $5/$25) and will be a recurring maintenance burden. I'd prefer we either pull cost data from the API response, make it user-configurable, or remove the built-in price list entirely rather than shipping something that will silently give wrong numbers. Second, a PR of this scope (eight new features, new config options, new CLI flags, and a breaking default change to autoCommit) needs user-facing documentation. Could you add docs for the new configuration options and a migration note for the autoCommit default change? The CodeRabbit review also flagged a command injection issue in ac-validator.ts that should be addressed before merge. |
- cost-tracker: remove hardcoded MODEL_PRICING table (was stale at $15/$75 for Opus). CostTracker now accepts pricing via constructor from CostConfig. Token counts are always tracked; dollar costs require user-configured pricing. - config/types: add CostConfig.pricing field with TOML example showing current prices — users own their pricing data so it stays accurate. - engine/index: pass config.cost?.pricing to CostTracker constructor. - ac-validator: reject command assertions containing shell metacharacters (; | & > < ` $ etc.) to prevent command injection from crafted AC text. - tests: update cost-tracker tests to inject pricing via constructor (opus price corrected to $5/$25); fix ac-validator test quote mismatch (single quotes, matching implementation); add injection-rejection test. - README: document verification, model escalation, completion strategies, cost tracking, and autoCommit config options. Add migration note for autoCommit default change (false → true) and new CLI flags. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (3)
src/engine/cost-tracker.ts (1)
65-72: Substring matching ingetPricingis still order-dependent for user-supplied keys.The previous review flagged that
Object.keys(...).find(k => model.includes(k))can match a shorter key before a more specific one (e.g. user configures both"opus"and"claude-opus-4-6"—"opus"wins if it appears first). The exact-match-first step is good, but the substring fallback should sort candidates by descending key length so longer, more specific keys are preferred.♻️ Suggested improvement
private getPricing(model?: string): ModelPricing | null { if (!model || Object.keys(this.pricing).length === 0) return null; // Exact match first if (this.pricing[model]) return this.pricing[model]; - // Substring match: find a key whose name appears in the model string - const key = Object.keys(this.pricing).find(k => model.includes(k)); + // Substring match: prefer the longest (most specific) key + const key = Object.keys(this.pricing) + .filter(k => model.includes(k)) + .sort((a, b) => b.length - a.length)[0]; return key ? this.pricing[key] : null; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/cost-tracker.ts` around lines 65 - 72, getPricing currently falls back to a substring lookup using Object.keys(this.pricing).find(k => model.includes(k)), which can choose a shorter user-supplied key before a more specific one; change the substring fallback so you sort candidate keys by descending length and then pick the first key whose name is included in model. Keep the existing exact-match check (this.pricing[model]) intact, and replace the find call with something like Object.keys(this.pricing).sort((a,b)=>b.length-a.length).find(... ) so longer/more specific pricing keys win when resolving pricing in getPricing.src/engine/index.ts (2)
1553-1564: Cost threshold fires repeatedly once crossed — still unaddressed.This was flagged in a previous review.
snapshot.totalCostis monotonically increasing, so oncetotalCost >= alertThreshold, every subsequent iteration will re-trigger the pause. The user resumes, the next iteration completes, and the engine pauses again immediately.A one-shot guard flag is needed:
🐛 Proposed fix — track threshold trip
Add a private field:
private costThresholdTripped = false;Then guard the check:
const alertThreshold = this.config.cost?.alertThreshold ?? 0; - if (alertThreshold > 0 && snapshot.totalCost >= alertThreshold) { + if (!this.costThresholdTripped && alertThreshold > 0 && snapshot.totalCost >= alertThreshold) { + this.costThresholdTripped = true; this.emit({ type: 'cost:threshold-exceeded', timestamp: endedAt.toISOString(), snapshot, threshold: alertThreshold, }); // Pause the engine so the user can decide whether to continue this.pause(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/index.ts` around lines 1553 - 1564, Add a one-shot guard to avoid re-triggering the cost threshold pause: introduce a private boolean field (e.g., costThresholdTripped = false) on the class and change the check around snapshot.totalCost/alertThreshold so the emit + this.pause() only run when alertThreshold > 0, snapshot.totalCost >= alertThreshold, and costThresholdTripped is false; after emitting and pausing set costThresholdTripped = true. Use the existing symbols snapshot.totalCost, alertThreshold, this.emit(...) and this.pause() and set/reset the new costThresholdTripped flag accordingly (no reset needed if cost is monotonic).
1022-1023:console.login the escalation path will corrupt TUI output — still unaddressed.This was flagged in a previous review. The
model:escalatedevent at line 1022 already surfaces this information to listeners; theconsole.logat line 1023 writes directly to stdout and will corrupt the TUI display in interactive mode.🐛 Proposed fix — remove the redundant log
this.emit(event); - console.log(`[model-escalation] Escalating model: ${previousModel} → ${escalatedModel} after ${attempts} failed attempt(s)`);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/index.ts` around lines 1022 - 1023, Remove the direct stdout write that corrupts the TUI by deleting the console.log line that prints model escalation (the line using previousModel, escalatedModel and attempts) immediately after this.emit(event); rely on the already-emitted "model:escalated" event for listeners to surface this info (or, if persistent logging is required, replace the console.log with a non-stdout logger call such as processLogger.debug instead of writing to stdout).
🧹 Nitpick comments (2)
src/engine/index.ts (2)
1376-1435: Verification gate logic is well-structured, with one concern about silent completion on retry exhaustion.When
verificationRetries >= verifyConfig.maxRetries(line 1417), the task is marked as complete despite failing verification. The inline comment says "skip the verification gate and mark done", which is a deliberate choice. However, this silently marks a task complete even though its verification never passed — consider emitting a distinct event (e.g.verification:skippedor a warning) so the user and TUI have visibility into this.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/index.ts` around lines 1376 - 1435, The code currently clears verification state and marks the task done when verificationRetries >= verifyConfig.maxRetries, but emits no distinct event; modify the branch where verificationRetries >= verifyConfig.maxRetries (referencing verificationRetries and verifyConfig.maxRetries) to emit a new event like type: 'verification:skipped' (or 'verification:exhausted') with timestamp, task, failures (use verifyResult.results.filter(r => !r.passed).map(r => r.command)), totalRetries: verifyConfig.maxRetries, and a message explaining verification exhausted before clearing lastVerificationErrors and deleting the retry map so the TUI and users can see the skipped verification; keep the existing cleanup behavior afterward.
1376-1385: AC-derived commands are prepended without deduplication.If a user also manually lists the same command in
verification.commands(e.g."bun test"), it will run twice — once from the AC extraction and once from the config. This is unlikely to cause breakage but wastes time. A simpleSet-based dedup would help.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine/index.ts` around lines 1376 - 1385, The code prepends AC-derived commands (getAcVerificationCommands(task.metadata)) to configured verification.commands (from DEFAULT_VERIFICATION_CONFIG merged into baseVerifyConfig) without deduplication, causing duplicate commands to run; modify the verifyConfig construction to merge the two arrays into a deduplicated list while preserving order (AC commands first, then any remaining configured commands) — e.g., build a new commands array by iterating acCommands then baseVerifyConfig.commands and adding each command only if not already seen (use a Set to track seen commands) and assign that deduped array to verifyConfig.commands.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 229-233: Update the stale Haiku pricing entry: change the
"claude-haiku-4-5" cost.pricing values from inputPer1M = 0.80 and outputPer1M =
4.0 to the current values inputPer1M = 1.0 and outputPer1M = 5.0 in the README
example; also make the identical update for the matching entry in
src/config/types.ts (look for the "claude-haiku-4-5" key and the
inputPer1M/outputPer1M fields) so examples and typed config stay consistent.
In `@src/config/types.ts`:
- Around line 509-520: Update the JSDoc `@example` for the pricing field to use
the current Haiku 4.5 rates: change the "claude-haiku-4-5" example values from
inputPer1M = 0.80, outputPer1M = 4.0 to inputPer1M = 1.0, outputPer1M = 5.0 so
the example in the pricing?: Record<string, { inputPer1M: number; outputPer1M:
number }> block reflects the correct $1/$5 per 1M tokens pricing.
In `@src/engine/ac-validator.ts`:
- Around line 83-84: The 'file-contains' switch branch in ac-validator.ts
returns a grep command that lacks a filename and will hang; update the
'file-contains' case in the switch (in src/engine/ac-validator.ts) to either
remove this unreachable branch entirely or change the returned command to
include an explicit file argument (e.g., accept or derive a filename/path from
the assertion object such as ac.path/ac.target and return something like "grep
-q -- '${shellEscape(ac.assertion)}' ${shellEscape(filePath)} || true"), and if
you can't yet supply a path add a clear TODO comment explaining the needed file
argument and leave a well-formed fallback (or remove the case). Ensure the
unique symbol 'file-contains' is handled consistently with other cases.
- Around line 67-89: The SHELL_METACHAR_RE used in acToVerificationCommands()
doesn't match newline or carriage-return so assertions containing embedded
newlines can bypass the blocklist; update the SHELL_METACHAR_RE to include \n
and \r (e.g. add \n\r into the character class) or explicitly reject assertions
that contain \n or \r before returning them in acToVerificationCommands/command
case so any assertion with embedded newlines/carriage-returns returns '' and
cannot be passed to the shell.
---
Duplicate comments:
In `@src/engine/cost-tracker.ts`:
- Around line 65-72: getPricing currently falls back to a substring lookup using
Object.keys(this.pricing).find(k => model.includes(k)), which can choose a
shorter user-supplied key before a more specific one; change the substring
fallback so you sort candidate keys by descending length and then pick the first
key whose name is included in model. Keep the existing exact-match check
(this.pricing[model]) intact, and replace the find call with something like
Object.keys(this.pricing).sort((a,b)=>b.length-a.length).find(... ) so
longer/more specific pricing keys win when resolving pricing in getPricing.
In `@src/engine/index.ts`:
- Around line 1553-1564: Add a one-shot guard to avoid re-triggering the cost
threshold pause: introduce a private boolean field (e.g., costThresholdTripped =
false) on the class and change the check around
snapshot.totalCost/alertThreshold so the emit + this.pause() only run when
alertThreshold > 0, snapshot.totalCost >= alertThreshold, and
costThresholdTripped is false; after emitting and pausing set
costThresholdTripped = true. Use the existing symbols snapshot.totalCost,
alertThreshold, this.emit(...) and this.pause() and set/reset the new
costThresholdTripped flag accordingly (no reset needed if cost is monotonic).
- Around line 1022-1023: Remove the direct stdout write that corrupts the TUI by
deleting the console.log line that prints model escalation (the line using
previousModel, escalatedModel and attempts) immediately after this.emit(event);
rely on the already-emitted "model:escalated" event for listeners to surface
this info (or, if persistent logging is required, replace the console.log with a
non-stdout logger call such as processLogger.debug instead of writing to
stdout).
---
Nitpick comments:
In `@src/engine/index.ts`:
- Around line 1376-1435: The code currently clears verification state and marks
the task done when verificationRetries >= verifyConfig.maxRetries, but emits no
distinct event; modify the branch where verificationRetries >=
verifyConfig.maxRetries (referencing verificationRetries and
verifyConfig.maxRetries) to emit a new event like type: 'verification:skipped'
(or 'verification:exhausted') with timestamp, task, failures (use
verifyResult.results.filter(r => !r.passed).map(r => r.command)), totalRetries:
verifyConfig.maxRetries, and a message explaining verification exhausted before
clearing lastVerificationErrors and deleting the retry map so the TUI and users
can see the skipped verification; keep the existing cleanup behavior afterward.
- Around line 1376-1385: The code prepends AC-derived commands
(getAcVerificationCommands(task.metadata)) to configured verification.commands
(from DEFAULT_VERIFICATION_CONFIG merged into baseVerifyConfig) without
deduplication, causing duplicate commands to run; modify the verifyConfig
construction to merge the two arrays into a deduplicated list while preserving
order (AC commands first, then any remaining configured commands) — e.g., build
a new commands array by iterating acCommands then baseVerifyConfig.commands and
adding each command only if not already seen (use a Set to track seen commands)
and assign that deduped array to verifyConfig.commands.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
README.mdsrc/config/types.tssrc/engine/ac-validator.tssrc/engine/cost-tracker.tssrc/engine/index.tstests/engine/ac-validator.test.tstests/engine/cost-tracker.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/engine/ac-validator.test.ts
- tests/engine/cost-tracker.test.ts
| [cost.pricing] | ||
| "claude-opus-4-6" = { inputPer1M = 5.0, outputPer1M = 25.0 } | ||
| "claude-sonnet-4-6" = { inputPer1M = 3.0, outputPer1M = 15.0 } | ||
| "claude-haiku-4-5" = { inputPer1M = 0.80, outputPer1M = 4.0 } | ||
| ``` |
There was a problem hiding this comment.
Haiku pricing example is stale — same issue as in src/config/types.ts.
Line 232 lists "claude-haiku-4-5" = { inputPer1M = 0.80, outputPer1M = 4.0 } but current Haiku 4.5 pricing is $1 / $5. Since users are likely to copy these examples verbatim, the values should be accurate.
📝 Suggested fix
-"claude-haiku-4-5" = { inputPer1M = 0.80, outputPer1M = 4.0 }
+"claude-haiku-4-5" = { inputPer1M = 1.0, outputPer1M = 5.0 }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| [cost.pricing] | |
| "claude-opus-4-6" = { inputPer1M = 5.0, outputPer1M = 25.0 } | |
| "claude-sonnet-4-6" = { inputPer1M = 3.0, outputPer1M = 15.0 } | |
| "claude-haiku-4-5" = { inputPer1M = 0.80, outputPer1M = 4.0 } | |
| ``` | |
| [cost.pricing] | |
| "claude-opus-4-6" = { inputPer1M = 5.0, outputPer1M = 25.0 } | |
| "claude-sonnet-4-6" = { inputPer1M = 3.0, outputPer1M = 15.0 } | |
| "claude-haiku-4-5" = { inputPer1M = 1.0, outputPer1M = 5.0 } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` around lines 229 - 233, Update the stale Haiku pricing entry:
change the "claude-haiku-4-5" cost.pricing values from inputPer1M = 0.80 and
outputPer1M = 4.0 to the current values inputPer1M = 1.0 and outputPer1M = 5.0
in the README example; also make the identical update for the matching entry in
src/config/types.ts (look for the "claude-haiku-4-5" key and the
inputPer1M/outputPer1M fields) so examples and typed config stay consistent.
| /** | ||
| * Model pricing in USD per 1M tokens. No built-in defaults are provided — | ||
| * configure this to enable dollar-cost estimation. Token counts are always tracked. | ||
| * | ||
| * @example | ||
| * [cost.pricing] | ||
| * "claude-opus-4-6" = { inputPer1M = 5.0, outputPer1M = 25.0 } | ||
| * "claude-sonnet-4-6" = { inputPer1M = 3.0, outputPer1M = 15.0 } | ||
| * "claude-haiku-4-5" = { inputPer1M = 0.80, outputPer1M = 4.0 } | ||
| */ | ||
| pricing?: Record<string, { inputPer1M: number; outputPer1M: number }>; | ||
| } |
There was a problem hiding this comment.
Pricing example in JSDoc shows outdated Haiku rate.
The @example block lists Haiku 4.5 at inputPer1M = 0.80, outputPer1M = 4.0, but current Haiku 4.5 pricing is $1 / $5 per 1M tokens. Since the whole rationale for removing built-in pricing was to avoid shipping stale numbers, the documentation example should be accurate to avoid copy-paste errors.
📝 Suggested fix
- * "claude-haiku-4-5" = { inputPer1M = 0.80, outputPer1M = 4.0 }
+ * "claude-haiku-4-5" = { inputPer1M = 1.0, outputPer1M = 5.0 }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/config/types.ts` around lines 509 - 520, Update the JSDoc `@example` for
the pricing field to use the current Haiku 4.5 rates: change the
"claude-haiku-4-5" example values from inputPer1M = 0.80, outputPer1M = 4.0 to
inputPer1M = 1.0, outputPer1M = 5.0 so the example in the pricing?:
Record<string, { inputPer1M: number; outputPer1M: number }> block reflects the
correct $1/$5 per 1M tokens pricing.
| const SHELL_METACHAR_RE = /[;&|><`$\\{}()!]/; | ||
|
|
||
| /** | ||
| * Convert executable AC into verification commands. | ||
| */ | ||
| export function acToVerificationCommands(acs: ExecutableAC[]): string[] { | ||
| return acs.map(ac => { | ||
| switch (ac.type) { | ||
| case 'command': | ||
| // Reject commands containing shell metacharacters to prevent injection. | ||
| // Acceptance criteria come from user-authored task files but may be | ||
| // processed by AI agents, so we enforce a conservative allowlist. | ||
| if (SHELL_METACHAR_RE.test(ac.assertion)) return ''; | ||
| return ac.assertion; | ||
| case 'file-exists': | ||
| return `test -e '${shellEscape(ac.assertion)}'`; | ||
| case 'file-contains': | ||
| return `grep -q '${shellEscape(ac.assertion)}' || true`; // soft check | ||
| default: | ||
| return ''; | ||
| } | ||
| }).filter(Boolean); | ||
| } |
There was a problem hiding this comment.
Command injection via embedded newlines bypasses the metacharacter blocklist.
SHELL_METACHAR_RE does not match \n or \r. An acceptance criterion like Running `bun test\nrm -rf /` passes would extract the command, pass the metacharacter check, and execute both bun test and rm -rf / when the string is passed to a shell. Since acceptance criteria may be authored or modified by AI agents (as the comment on line 77–78 acknowledges), this is exploitable.
Add \n and \r to the blocklist:
🐛 Proposed fix
-const SHELL_METACHAR_RE = /[;&|><`$\\{}()!]/;
+const SHELL_METACHAR_RE = /[;&|><`$\\{}()!\n\r]/;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/engine/ac-validator.ts` around lines 67 - 89, The SHELL_METACHAR_RE used
in acToVerificationCommands() doesn't match newline or carriage-return so
assertions containing embedded newlines can bypass the blocklist; update the
SHELL_METACHAR_RE to include \n and \r (e.g. add \n\r into the character class)
or explicitly reject assertions that contain \n or \r before returning them in
acToVerificationCommands/command case so any assertion with embedded
newlines/carriage-returns returns '' and cannot be passed to the shell.
| case 'file-contains': | ||
| return `grep -q '${shellEscape(ac.assertion)}' || true`; // soft check |
There was a problem hiding this comment.
file-contains branch produces a broken grep command (no filename argument).
grep -q '${pattern}' || true reads from stdin, which will hang or return immediately with no match. This branch is currently unreachable (no parse rule produces 'file-contains'), but if it's ever wired up it will silently malfunction. Either remove it or fix the command to include a file path.
♻️ Suggested fix — remove unreachable branch or add a TODO
case 'file-contains':
- return `grep -q '${shellEscape(ac.assertion)}' || true`; // soft check
+ // TODO: file-contains not yet produced by parseExecutableCriteria
+ return '';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case 'file-contains': | |
| return `grep -q '${shellEscape(ac.assertion)}' || true`; // soft check | |
| case 'file-contains': | |
| // TODO: file-contains not yet produced by parseExecutableCriteria | |
| return ''; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/engine/ac-validator.ts` around lines 83 - 84, The 'file-contains' switch
branch in ac-validator.ts returns a grep command that lacks a filename and will
hang; update the 'file-contains' case in the switch (in
src/engine/ac-validator.ts) to either remove this unreachable branch entirely or
change the returned command to include an explicit file argument (e.g., accept
or derive a filename/path from the assertion object such as ac.path/ac.target
and return something like "grep -q -- '${shellEscape(ac.assertion)}'
${shellEscape(filePath)} || true"), and if you can't yet supply a path add a
clear TODO comment explaining the needed file argument and leave a well-formed
fallback (or remove the case). Ensure the unique symbol 'file-contains' is
handled consistently with other cases.
Summary
autoCommit: truedefault, improved commit messages with iteration context,--no-auto-commitopt-out--conflict-timeoutCLI flag, improved help textTest plan
bun run typecheckpassesbun run buildpassesbun test— 3350 pass, 4 pre-existing failures (unrelated beads tracker tests)--start-model sonnet --escalate-model opusto verify escalationverification.commandsin config and confirm gate runs post-completion🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Configuration & CLI Enhancements
Documentation