feat(harness): stop identical-call loops and hand back stuck sub-agents as incomplete by sanil-23 · Pull Request #4230 · tinyhumansai/openhuman

sanil-23 · 2026-06-27T09:58:13Z

Summary

Add RepeatCallGuard ([Harness] Add circuit breaker for repeated identical tool calls #4088 / [Harness] Agents enter unproductive loops (repeated tool calls, no progress) #4095): a circuit breaker that halts an agent turn after 3 consecutive identical (tool, args) calls, independent of whether each call succeeds — closing the loop-detection gap the two existing guards miss.
Add a TurnStop discriminator + SubagentRunStatus::Incomplete ([Harness] Escalate to orchestrator (Core) after repeated sub-agent failures #4091 / [Harness] Agents don't escalate / hand back control when stuck #4096): a stuck or capped sub-agent now hands the orchestrator a structured [SUBAGENT_INCOMPLETE] envelope (partial progress + blocker) instead of a success-framed result.
All delegation tools (spawn_subagent, spawn_async_subagent, continue_subagent, dispatch) and status consumers updated; one orchestrator-prompt rule added to relay incomplete results rather than re-run unchanged.

Problem

Fuzz testing surfaced two agent-reliability failures:

Unproductive loops ([Harness] Agents enter unproductive loops (repeated tool calls, no progress) #4095): agents re-issue the same tool call repeatedly and never terminate. The existing guards don't catch this when the call succeeds: RepeatFailureGuard resets on every success (tool_loop.rs), and RepeatOutputGuard hashes the assistant narration alongside the call, so trivially varied prose around an identical call resets its streak.
No handback when stuck ([Harness] Agents don't escalate / hand back control when stuck #4096): when a sub-agent got stuck (circuit-breaker halt) or hit its iteration cap, the runner returned its summary as SubagentRunStatus::Completed and the delegation tool wrapped it in ToolResult::success(...). The partial-progress prose was delivered, but nothing marked it as incomplete — so a weak orchestrator could narrate it as done or silently re-spawn the same delegation.

Solution

Loop detection (#4095). New RepeatCallGuard in tool_loop.rs, wired into engine/core.rs before tool execution. It keys only on canonical (tool, args) (sorted-key JSON, so reordered keys can't evade it), is independent of success, and trips at REPEAT_CALL_THRESHOLD = 3. Any distinct intervening call resets the streak, so read→write→read and varied-arg enumeration never trip it.

Stuck handback (#4096). New TurnStop enum (Final / Halted / Cap / EarlyExit) replaces the hit_cap bool on TurnEngineOutcome, giving callers an explicit reason a turn ended. The sub-agent runner maps Halted/Cap to a new SubagentRunStatus::Incomplete { reason }, carrying the partial-progress summary as output. Delegation tools render a [SUBAGENT_INCOMPLETE] envelope (success + partial progress, not the "nothing happened" error envelope) and a GROUNDING_BODY prompt rule tells the orchestrator to relay it and not re-run unchanged.

Design notes:

The two fixes share the same stop-detection path; TurnStop also back-fills the new repeat-call halt site.
Adding the Incomplete variant is compiler-enforced — every exhaustive match on SubagentRunStatus was updated.
Reason/suggestion text is generated from data the harness already has (breaker counters, failure class); no extra LLM calls.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) — repeat-call guard unit tests (trips / resets-on-distinct / varied-args / reordered-keys), engine Halted + Cap TurnStop tests, and an end-to-end stuck_subagent_returns_incomplete_status test.
Diff coverage ≥ 80% — targeted tests cover the new guard, the TurnStop mapping, and the end-to-end Incomplete path. cargo-llvm-cov + diff-cover was not run locally (heavy); the per-tool envelope arms are exercised indirectly and CI's rust-core-coverage/coverage-gate is the authoritative check.
Coverage matrix updated — N/A: behaviour change to existing harness loop; no feature row added/removed/renamed.
All affected feature IDs from the matrix are listed under ## Related — N/A: no matrix rows touched.
No new external network dependencies introduced — no dependencies added; all tests use scripted in-process providers.
Manual smoke checklist updated if this touches release-cut surfaces — N/A: core agent-loop logic, no release-cut UI surface.
Linked issue closed via Closes #NNN in the ## Related section.

Impact

Runtime: Rust core only (src/). No Tauri shell, frontend, or dependency changes.
Behaviour: an agent spinning on an identical call now halts after 3 repeats with a no-progress summary; a stuck/capped sub-agent now reports Incomplete with its partial progress instead of a success-framed result. Clean finishes and the iteration-cap checkpoint prose are unchanged.
No migration, security, or compatibility implications.

Closes [Harness] Add circuit breaker for repeated identical tool calls #4088
Closes [Harness] Escalate to orchestrator (Core) after repeated sub-agent failures #4091
Sub-issues of tracking issues [Harness] Agents enter unproductive loops (repeated tool calls, no progress) #4095 (loop detection) and [Harness] Agents don't escalate / hand back control when stuck #4096 (escalation / handback). [Harness] Self-suspend long-running monitor/poll loops on diminishing returns #4090 (monitor self-suspend) and [Harness] Escalate to the user after N unproductive iterations #4092 (escalate-to-user) are intentionally out of scope; [Harness] Runs terminate without resolution or a final summary #4097 is already covered by the orchestrator's existing checkpoint/empty-final handling.
Follow-up PR(s)/TODOs: optional per-tool envelope-arm tests for stricter diff coverage; make REPEAT_CALL_THRESHOLD configurable ([Harness] Add circuit breaker for repeated identical tool calls #4088 stretch).

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: feat/harness-loop-and-incomplete-handback
Commit SHA: see PR head

Validation Run

pnpm --filter openhuman-app format:check — N/A, no frontend (app) changes.
pnpm typecheck — N/A, no TypeScript changes.
Focused tests: cargo test --lib for engine::core (9), subagent_runner (57), agent::prompts (44), agent_orchestration (226), agent::harness::session (136) — all pass.
Rust fmt/check (if changed): cargo fmt applied; cargo check --manifest-path Cargo.toml clean (0 errors); full lib test build (--no-run) succeeds.
Tauri fmt/check (if changed): N/A, no app/src-tauri changes.

Validation Blocked

command: pnpm rust:check (pre-push hook)
error: fails in a fresh worktree (missing vendored CEF libs), unrelated to this change
impact: pushed with --no-verify; no Tauri/CEF code touched, so the hook's check does not apply to this diff.

Behavior Changes

Intended behavior change: halt identical-call loops after 3 repeats; mark stuck/capped sub-agent runs as Incomplete with partial progress.
User-visible effect: agents stop wasting the step budget spinning on one call, and a blocked delegation surfaces "did not finish, here's what I got" instead of a fabricated success.

Parity Contract

Legacy behavior preserved: clean final responses, the iteration-cap checkpoint prose, AwaitingUser pauses, and the channel/CLI ErrorCheckpoint path are unchanged. Existing 2-identical-call tests stay green (threshold is 3).
Guard/fallback/dispatch parity checks: RepeatCallGuard sits alongside the existing RepeatFailureGuard/RepeatOutputGuard without altering them; TurnStop replaces the hit_cap bool with equivalent Cap semantics at its single read site.

Duplicate / Superseded PR Handling

Duplicate PR(s): None
Canonical PR: This PR
Resolution: N/A

Summary by CodeRabbit

New Features
- Agents and sub-agents now surface an explicit Incomplete outcome (stop reason + partial output) when they halt before finishing.
- Added a repeat identical tool-call safeguard that stops stuck loops when the same tool call is re-issued without progress.
Bug Fixes
- Improved turn termination handling with clearer halt vs cap behavior, including correct memory autosave suppression.
- Prevents repeated successful identical calls from continuing indefinitely.
Documentation
- Updated grounding guidance to avoid fabricating or silently re-running unchanged calls after incomplete/blocked results.
Tests
- Added coverage for repeat-call stopping and incomplete sub-agent status behavior.

…ts as incomplete Two agent-reliability fixes surfaced by fuzz testing: - tinyhumansai#4088 (tinyhumansai#4095): add RepeatCallGuard, a circuit breaker that halts a turn after 3 consecutive identical (tool, args) calls regardless of success. Closes the gap RepeatFailureGuard (resets on success) and RepeatOutputGuard (also hashes narration) both miss. Resets on any distinct intervening call, so read->write->read and varied-arg enumeration never trip it. - tinyhumansai#4091 (tinyhumansai#4096): add a TurnStop discriminator (Final/Halted/Cap/EarlyExit) and SubagentRunStatus::Incomplete so a stuck or capped sub-agent hands the orchestrator a [SUBAGENT_INCOMPLETE] envelope (partial progress + blocker) instead of a success-framed result it could narrate as done or re-spawn. All delegation tools and status consumers updated; one orchestrator-prompt rule added to relay incomplete results rather than re-run unchanged. Rust core only; no Tauri/frontend/dependency changes. Tests: repeat-call guard units, engine TurnStop Halted/Cap tests, and an end-to-end stuck-subagent Incomplete test.

coderabbitai · 2026-06-27T09:58:33Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 33607b5e-b60b-4145-9a42-a90552bf398d

📥 Commits

Reviewing files that changed from the base of the PR and between e3a7f1e and 202366e.

📒 Files selected for processing (4)

src/openhuman/agent/harness/engine/core.rs
src/openhuman/agent/harness/engine/core_tests.rs
src/openhuman/agent/harness/tool_loop.rs
src/openhuman/agent/harness/tool_loop_tests.rs

🚧 Files skipped from review as they are similar to previous changes (2)

src/openhuman/agent/harness/tool_loop.rs
src/openhuman/agent/harness/engine/core.rs

📝 Walkthrough

Walkthrough

The turn engine now emits explicit stop states, including a repeat-call halt. Subagent runner and orchestration code propagate incomplete outcomes with structured reasons, partial output, and updated handling across spawned, continued, and dispatched runs.

Changes

Turn stop-state propagation

Layer / File(s)	Summary
Engine stop states and repeat-call breaker `src/openhuman/agent/harness/engine/core.rs`, `src/openhuman/agent/harness/tool_loop.rs`, `src/openhuman/agent/harness/engine/mod.rs`	`TurnEngineOutcome` now carries `TurnStop`; `run_turn_engine` records `Final`, `Halted`, `Cap`, and `EarlyExit`; `RepeatCallGuard` halts identical `(tool, args)` batches; `TurnStop` is re-exported from the engine module.
Engine validation and autosave gate `src/openhuman/agent/harness/engine/core_tests.rs`, `src/openhuman/agent/harness/tool_loop_tests.rs`, `src/openhuman/agent/harness/session/turn/core.rs`	Tests cover repeated-call halts, signature resets, argument canonicalization, polling exemptions, and cap/final behavior, and turn autosave now checks `TurnStop::Cap`.
Subagent runner stop mapping `src/openhuman/agent/harness/subagent_runner/types.rs`, `src/openhuman/agent/harness/subagent_runner/ops/loop_.rs`, `src/openhuman/agent/harness/subagent_runner/ops/runner.rs`, `src/openhuman/agent/harness/subagent_runner/ops_tests.rs`	`SubagentRunStatus` adds `Incomplete { reason }`, inner-loop results now include `TurnStop`, typed-mode maps `Halted`/`Cap` to incomplete status, and the new test covers the incomplete path.
Incomplete subagent handling `src/openhuman/agent_orchestration/subagent_sessions/types.rs`, `src/openhuman/agent_orchestration/tools/agent_prepare_context.rs`, `src/openhuman/agent_orchestration/tools/continue_subagent.rs`, `src/openhuman/agent_orchestration/tools/dispatch.rs`, `src/openhuman/agent_orchestration/tools/spawn_async_subagent.rs`, `src/openhuman/agent_orchestration/tools/spawn_subagent.rs`, `src/openhuman/agent_memory/tools.rs`, `src/openhuman/agent/prompts/sections.rs`	`SubagentRunStatus::Incomplete` is mapped to durable idle state and wrapped in incomplete envelopes, with context scouting, continuation, dispatch, async spawn, memory, and grounding prompt updates.

Sequence Diagram(s)

sequenceDiagram
  participant run_turn_engine
  participant RepeatCallGuard
  participant TurnEngineOutcome
  run_turn_engine->>RepeatCallGuard: record(canonical (tool, args) batch)
  RepeatCallGuard-->>run_turn_engine: halt summary after REPEAT_CALL_THRESHOLD
  run_turn_engine->>TurnEngineOutcome: stop = Halted

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

tinyhumansai/openhuman#3678: Related GROUNDING_BODY prompt updates in src/openhuman/agent/prompts/sections.rs.
tinyhumansai/openhuman#4134: Related turn-engine termination and circuit-breaker handling in src/openhuman/agent/harness/engine/core.rs.
tinyhumansai/openhuman#4091: Directly related issue for repeated sub-agent failures and structured escalation back to the orchestrator.

Suggested labels

rust-core, agent, bug

Poem

I hopped through loops and sniffed the trail,
When calls repeated, I rang the bell.
Final, cap, halt—new signposts in view,
Stuck subagents now send back what’s true.
૮˘ﻌ˘ა Thump-thump—progress can start anew.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main harness changes: repeat-call loop breaking and returning stuck sub-agents as incomplete.
Linked Issues check	✅ Passed	The PR implements both requested behaviors: identical-call circuit breaking and structured incomplete escalation for sub-agents.
Out of Scope Changes check	✅ Passed	The changes appear scoped to loop-breaking and sub-agent escalation plumbing, tests, and prompt/status updates supporting those goals.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing Touches

⚔️ Resolve merge conflicts

Resolve merge conflict in branch feat/harness-loop-and-incomplete-handback

_{Comment @coderabbitai help to get the list of available commands.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 935e995445

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent_orchestration/tools/continue_subagent.rs`:
- Around line 315-359: The Incomplete branch in continue_subagent’s
SubagentRunStatus handling leaves a stale checkpoint file behind after the
resumed run stops short again. Mirror the cleanup done in the Completed arm by
adding a best-effort remove_file on checkpoint_path in this Incomplete case,
keeping the checkpoint lifecycle consistent in continue_subagent.

In `@src/openhuman/agent/harness/engine/core_tests.rs`:
- Around line 665-674: The repeated-call test in RepeatedCallProvider::chat is
not isolating the repeat-CALL guard because the narration text is constant and
can also trip the repeat-OUTPUT guard. Add a counter-backed varying narration to
RepeatedCallProvider, mirroring VariedCallProvider, so each ChatResponse text
changes while tool_calls stays identical; this makes the test’s assertion on the
call breaker accurate and keeps the comment aligned with behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f69aa420-dbb1-47e3-a2c8-65c7cb287175

📥 Commits

Reviewing files that changed from the base of the PR and between 5a41a4f and 935e995.

📒 Files selected for processing (18)

src/openhuman/agent/harness/engine/core.rs
src/openhuman/agent/harness/engine/core_tests.rs
src/openhuman/agent/harness/engine/mod.rs
src/openhuman/agent/harness/session/turn/core.rs
src/openhuman/agent/harness/subagent_runner/ops/loop_.rs
src/openhuman/agent/harness/subagent_runner/ops/runner.rs
src/openhuman/agent/harness/subagent_runner/ops_tests.rs
src/openhuman/agent/harness/subagent_runner/types.rs
src/openhuman/agent/harness/tool_loop.rs
src/openhuman/agent/harness/tool_loop_tests.rs
src/openhuman/agent/prompts/sections.rs
src/openhuman/agent_memory/tools.rs
src/openhuman/agent_orchestration/subagent_sessions/types.rs
src/openhuman/agent_orchestration/tools/agent_prepare_context.rs
src/openhuman/agent_orchestration/tools/continue_subagent.rs
src/openhuman/agent_orchestration/tools/dispatch.rs
src/openhuman/agent_orchestration/tools/spawn_async_subagent.rs
src/openhuman/agent_orchestration/tools/spawn_subagent.rs

- continue_subagent: remove the stale AwaitingUser checkpoint file in the Incomplete arm too (best-effort), mirroring the Completed arm, so a resumed run that stops short again doesn't orphan the checkpoint on disk. - core_tests: vary the narration per call in RepeatedCallProvider so the repeat-OUTPUT guard keeps resetting and only the repeat-CALL guard can trip — genuinely isolating the call breaker (the prior constant narration let the output guard accumulate too, just firing later).

… it on success Addresses Codex P1 on tinyhumansai#4230: the repeat-call breaker halted any identical (tool, args) batch before dispatch, including polling tools like wait_subagent whose contract is to be re-invoked with identical args until the work finishes. A task outliving two timeout windows would have its third wait_subagent halted before collecting the result, misreporting a stuck turn. - Add is_repeat_call_exempt (wait_subagent) and reset() on both no-progress guards; exempt all-poll batches from both breakers. - Move the repeat-call breaker post-execution and gate it on success: identical *failing* calls now stay the failure guard's domain (with its per-class thresholds and richer halt message) instead of being preempted at 3. Restores run_tool_call_loop_halts_on_repeated_identical_failure. Tests: polling-exemption unit + reset test, end-to-end polling_tool_is_exempt (5 identical wait_subagent polls finish normally), and the tinyhumansai#4095 success-loop test still halts at the threshold.

…nd-incomplete-handback # Conflicts: # src/openhuman/agent/harness/engine/core.rs # src/openhuman/agent/harness/subagent_runner/ops/runner.rs

…at-call breaker doesn't pre-empt the cap The new repeat-call breaker halts identical (tool,args) loops at 3 reps, so cap/checkpoint tests that reached max_iterations via an identical call now trip the breaker first. Vary the call each turn (distinct args) so these tests still exercise the iteration-cap path: - channels IterativeToolProvider: embed the iteration index as an extra mock_price `step` arg (it already varied narration; the call itself was identical, which the narration-independent repeat-call guard caught). - agent turn_emits_checkpoint_at_max_iterations: vary the echo message by i. - run_tool_call_loop_returns_max_iterations_error: push varied forced responses.

sanil-23 requested a review from a team June 27, 2026 09:58

coderabbitai Bot added agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jun 27, 2026

chatgpt-codex-connector Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread src/openhuman/agent/harness/engine/core.rs Outdated

coderabbitai Bot requested changes Jun 27, 2026

View reviewed changes

Comment thread src/openhuman/agent_orchestration/tools/continue_subagent.rs

Comment thread src/openhuman/agent/harness/engine/core_tests.rs Outdated

coderabbitai Bot previously approved these changes Jun 27, 2026

View reviewed changes

sanil-23 dismissed coderabbitai[bot]’s stale review via 202366e June 27, 2026 10:51

coderabbitai Bot added bug and removed feature Net-new user-facing capability or product behavior. labels Jun 27, 2026

coderabbitai Bot previously approved these changes Jun 27, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into feat/harness-loop-a…

4ba9f34

…nd-incomplete-handback # Conflicts: # src/openhuman/agent/harness/engine/core.rs # src/openhuman/agent/harness/subagent_runner/ops/runner.rs

sanil-23 dismissed coderabbitai[bot]’s stale review via 4ba9f34 June 27, 2026 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(harness): stop identical-call loops and hand back stuck sub-agents as incomplete#4230

feat(harness): stop identical-call loops and hand back stuck sub-agents as incomplete#4230
sanil-23 wants to merge 5 commits into
tinyhumansai:mainfrom
sanil-23:feat/harness-loop-and-incomplete-handback

sanil-23 commented Jun 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sanil-23 commented Jun 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sanil-23 commented Jun 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading