Skip to content

feat(harness): stop identical-call loops and hand back stuck sub-agents as incomplete#4230

Open
sanil-23 wants to merge 5 commits into
tinyhumansai:mainfrom
sanil-23:feat/harness-loop-and-incomplete-handback
Open

feat(harness): stop identical-call loops and hand back stuck sub-agents as incomplete#4230
sanil-23 wants to merge 5 commits into
tinyhumansai:mainfrom
sanil-23:feat/harness-loop-and-incomplete-handback

Conversation

@sanil-23

@sanil-23 sanil-23 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Problem

Fuzz testing surfaced two agent-reliability failures:

  • Unproductive loops ([Harness] Agents enter unproductive loops (repeated tool calls, no progress) #4095): agents re-issue the same tool call repeatedly and never terminate. The existing guards don't catch this when the call succeeds: RepeatFailureGuard resets on every success (tool_loop.rs), and RepeatOutputGuard hashes the assistant narration alongside the call, so trivially varied prose around an identical call resets its streak.
  • No handback when stuck ([Harness] Agents don't escalate / hand back control when stuck #4096): when a sub-agent got stuck (circuit-breaker halt) or hit its iteration cap, the runner returned its summary as SubagentRunStatus::Completed and the delegation tool wrapped it in ToolResult::success(...). The partial-progress prose was delivered, but nothing marked it as incomplete — so a weak orchestrator could narrate it as done or silently re-spawn the same delegation.

Solution

Loop detection (#4095). New RepeatCallGuard in tool_loop.rs, wired into engine/core.rs before tool execution. It keys only on canonical (tool, args) (sorted-key JSON, so reordered keys can't evade it), is independent of success, and trips at REPEAT_CALL_THRESHOLD = 3. Any distinct intervening call resets the streak, so read→write→read and varied-arg enumeration never trip it.

Stuck handback (#4096). New TurnStop enum (Final / Halted / Cap / EarlyExit) replaces the hit_cap bool on TurnEngineOutcome, giving callers an explicit reason a turn ended. The sub-agent runner maps Halted/Cap to a new SubagentRunStatus::Incomplete { reason }, carrying the partial-progress summary as output. Delegation tools render a [SUBAGENT_INCOMPLETE] envelope (success + partial progress, not the "nothing happened" error envelope) and a GROUNDING_BODY prompt rule tells the orchestrator to relay it and not re-run unchanged.

Design notes:

  • The two fixes share the same stop-detection path; TurnStop also back-fills the new repeat-call halt site.
  • Adding the Incomplete variant is compiler-enforced — every exhaustive match on SubagentRunStatus was updated.
  • Reason/suggestion text is generated from data the harness already has (breaker counters, failure class); no extra LLM calls.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) — repeat-call guard unit tests (trips / resets-on-distinct / varied-args / reordered-keys), engine Halted + Cap TurnStop tests, and an end-to-end stuck_subagent_returns_incomplete_status test.
  • Diff coverage ≥ 80% — targeted tests cover the new guard, the TurnStop mapping, and the end-to-end Incomplete path. cargo-llvm-cov + diff-cover was not run locally (heavy); the per-tool envelope arms are exercised indirectly and CI's rust-core-coverage/coverage-gate is the authoritative check.
  • Coverage matrix updated — N/A: behaviour change to existing harness loop; no feature row added/removed/renamed.
  • All affected feature IDs from the matrix are listed under ## Related — N/A: no matrix rows touched.
  • No new external network dependencies introduced — no dependencies added; all tests use scripted in-process providers.
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: core agent-loop logic, no release-cut UI surface.
  • Linked issue closed via Closes #NNN in the ## Related section.

Impact

  • Runtime: Rust core only (src/). No Tauri shell, frontend, or dependency changes.
  • Behaviour: an agent spinning on an identical call now halts after 3 repeats with a no-progress summary; a stuck/capped sub-agent now reports Incomplete with its partial progress instead of a success-framed result. Clean finishes and the iteration-cap checkpoint prose are unchanged.
  • No migration, security, or compatibility implications.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: feat/harness-loop-and-incomplete-handback
  • Commit SHA: see PR head

Validation Run

  • pnpm --filter openhuman-app format:check — N/A, no frontend (app) changes.
  • pnpm typecheck — N/A, no TypeScript changes.
  • Focused tests: cargo test --lib for engine::core (9), subagent_runner (57), agent::prompts (44), agent_orchestration (226), agent::harness::session (136) — all pass.
  • Rust fmt/check (if changed): cargo fmt applied; cargo check --manifest-path Cargo.toml clean (0 errors); full lib test build (--no-run) succeeds.
  • Tauri fmt/check (if changed): N/A, no app/src-tauri changes.

Validation Blocked

  • command: pnpm rust:check (pre-push hook)
  • error: fails in a fresh worktree (missing vendored CEF libs), unrelated to this change
  • impact: pushed with --no-verify; no Tauri/CEF code touched, so the hook's check does not apply to this diff.

Behavior Changes

  • Intended behavior change: halt identical-call loops after 3 repeats; mark stuck/capped sub-agent runs as Incomplete with partial progress.
  • User-visible effect: agents stop wasting the step budget spinning on one call, and a blocked delegation surfaces "did not finish, here's what I got" instead of a fabricated success.

Parity Contract

  • Legacy behavior preserved: clean final responses, the iteration-cap checkpoint prose, AwaitingUser pauses, and the channel/CLI ErrorCheckpoint path are unchanged. Existing 2-identical-call tests stay green (threshold is 3).
  • Guard/fallback/dispatch parity checks: RepeatCallGuard sits alongside the existing RepeatFailureGuard/RepeatOutputGuard without altering them; TurnStop replaces the hit_cap bool with equivalent Cap semantics at its single read site.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): None
  • Canonical PR: This PR
  • Resolution: N/A

Summary by CodeRabbit

  • New Features

    • Agents and sub-agents now surface an explicit Incomplete outcome (stop reason + partial output) when they halt before finishing.
    • Added a repeat identical tool-call safeguard that stops stuck loops when the same tool call is re-issued without progress.
  • Bug Fixes

    • Improved turn termination handling with clearer halt vs cap behavior, including correct memory autosave suppression.
    • Prevents repeated successful identical calls from continuing indefinitely.
  • Documentation

    • Updated grounding guidance to avoid fabricating or silently re-running unchanged calls after incomplete/blocked results.
  • Tests

    • Added coverage for repeat-call stopping and incomplete sub-agent status behavior.

…ts as incomplete

Two agent-reliability fixes surfaced by fuzz testing:

- tinyhumansai#4088 (tinyhumansai#4095): add RepeatCallGuard, a circuit breaker that halts a turn
  after 3 consecutive identical (tool, args) calls regardless of success.
  Closes the gap RepeatFailureGuard (resets on success) and RepeatOutputGuard
  (also hashes narration) both miss. Resets on any distinct intervening call,
  so read->write->read and varied-arg enumeration never trip it.

- tinyhumansai#4091 (tinyhumansai#4096): add a TurnStop discriminator (Final/Halted/Cap/EarlyExit) and
  SubagentRunStatus::Incomplete so a stuck or capped sub-agent hands the
  orchestrator a [SUBAGENT_INCOMPLETE] envelope (partial progress + blocker)
  instead of a success-framed result it could narrate as done or re-spawn.
  All delegation tools and status consumers updated; one orchestrator-prompt
  rule added to relay incomplete results rather than re-run unchanged.

Rust core only; no Tauri/frontend/dependency changes. Tests: repeat-call guard
units, engine TurnStop Halted/Cap tests, and an end-to-end stuck-subagent
Incomplete test.
@sanil-23 sanil-23 requested a review from a team June 27, 2026 09:58
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 33607b5e-b60b-4145-9a42-a90552bf398d

📥 Commits

Reviewing files that changed from the base of the PR and between e3a7f1e and 202366e.

📒 Files selected for processing (4)
  • src/openhuman/agent/harness/engine/core.rs
  • src/openhuman/agent/harness/engine/core_tests.rs
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/agent/harness/tool_loop_tests.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/agent/harness/engine/core.rs

📝 Walkthrough

Walkthrough

The turn engine now emits explicit stop states, including a repeat-call halt. Subagent runner and orchestration code propagate incomplete outcomes with structured reasons, partial output, and updated handling across spawned, continued, and dispatched runs.

Changes

Turn stop-state propagation

Layer / File(s) Summary
Engine stop states and repeat-call breaker
src/openhuman/agent/harness/engine/core.rs, src/openhuman/agent/harness/tool_loop.rs, src/openhuman/agent/harness/engine/mod.rs
TurnEngineOutcome now carries TurnStop; run_turn_engine records Final, Halted, Cap, and EarlyExit; RepeatCallGuard halts identical (tool, args) batches; TurnStop is re-exported from the engine module.
Engine validation and autosave gate
src/openhuman/agent/harness/engine/core_tests.rs, src/openhuman/agent/harness/tool_loop_tests.rs, src/openhuman/agent/harness/session/turn/core.rs
Tests cover repeated-call halts, signature resets, argument canonicalization, polling exemptions, and cap/final behavior, and turn autosave now checks TurnStop::Cap.
Subagent runner stop mapping
src/openhuman/agent/harness/subagent_runner/types.rs, src/openhuman/agent/harness/subagent_runner/ops/loop_.rs, src/openhuman/agent/harness/subagent_runner/ops/runner.rs, src/openhuman/agent/harness/subagent_runner/ops_tests.rs
SubagentRunStatus adds Incomplete { reason }, inner-loop results now include TurnStop, typed-mode maps Halted/Cap to incomplete status, and the new test covers the incomplete path.
Incomplete subagent handling
src/openhuman/agent_orchestration/subagent_sessions/types.rs, src/openhuman/agent_orchestration/tools/agent_prepare_context.rs, src/openhuman/agent_orchestration/tools/continue_subagent.rs, src/openhuman/agent_orchestration/tools/dispatch.rs, src/openhuman/agent_orchestration/tools/spawn_async_subagent.rs, src/openhuman/agent_orchestration/tools/spawn_subagent.rs, src/openhuman/agent_memory/tools.rs, src/openhuman/agent/prompts/sections.rs
SubagentRunStatus::Incomplete is mapped to durable idle state and wrapped in incomplete envelopes, with context scouting, continuation, dispatch, async spawn, memory, and grounding prompt updates.

Sequence Diagram(s)

sequenceDiagram
  participant run_turn_engine
  participant RepeatCallGuard
  participant TurnEngineOutcome
  run_turn_engine->>RepeatCallGuard: record(canonical (tool, args) batch)
  RepeatCallGuard-->>run_turn_engine: halt summary after REPEAT_CALL_THRESHOLD
  run_turn_engine->>TurnEngineOutcome: stop = Halted
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

rust-core, agent, bug

Poem

I hopped through loops and sniffed the trail,
When calls repeated, I rang the bell.
Final, cap, halt—new signposts in view,
Stuck subagents now send back what’s true.
૮˘ﻌ˘ა Thump-thump—progress can start anew.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main harness changes: repeat-call loop breaking and returning stuck sub-agents as incomplete.
Linked Issues check ✅ Passed The PR implements both requested behaviors: identical-call circuit breaking and structured incomplete escalation for sub-agents.
Out of Scope Changes check ✅ Passed The changes appear scoped to loop-breaking and sub-agent escalation plumbing, tests, and prompt/status updates supporting those goals.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch feat/harness-loop-and-incomplete-handback

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot added agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jun 27, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 935e995445

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/openhuman/agent/harness/engine/core.rs Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent_orchestration/tools/continue_subagent.rs`:
- Around line 315-359: The Incomplete branch in continue_subagent’s
SubagentRunStatus handling leaves a stale checkpoint file behind after the
resumed run stops short again. Mirror the cleanup done in the Completed arm by
adding a best-effort remove_file on checkpoint_path in this Incomplete case,
keeping the checkpoint lifecycle consistent in continue_subagent.

In `@src/openhuman/agent/harness/engine/core_tests.rs`:
- Around line 665-674: The repeated-call test in RepeatedCallProvider::chat is
not isolating the repeat-CALL guard because the narration text is constant and
can also trip the repeat-OUTPUT guard. Add a counter-backed varying narration to
RepeatedCallProvider, mirroring VariedCallProvider, so each ChatResponse text
changes while tool_calls stays identical; this makes the test’s assertion on the
call breaker accurate and keeps the comment aligned with behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f69aa420-dbb1-47e3-a2c8-65c7cb287175

📥 Commits

Reviewing files that changed from the base of the PR and between 5a41a4f and 935e995.

📒 Files selected for processing (18)
  • src/openhuman/agent/harness/engine/core.rs
  • src/openhuman/agent/harness/engine/core_tests.rs
  • src/openhuman/agent/harness/engine/mod.rs
  • src/openhuman/agent/harness/session/turn/core.rs
  • src/openhuman/agent/harness/subagent_runner/ops/loop_.rs
  • src/openhuman/agent/harness/subagent_runner/ops/runner.rs
  • src/openhuman/agent/harness/subagent_runner/ops_tests.rs
  • src/openhuman/agent/harness/subagent_runner/types.rs
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/agent/harness/tool_loop_tests.rs
  • src/openhuman/agent/prompts/sections.rs
  • src/openhuman/agent_memory/tools.rs
  • src/openhuman/agent_orchestration/subagent_sessions/types.rs
  • src/openhuman/agent_orchestration/tools/agent_prepare_context.rs
  • src/openhuman/agent_orchestration/tools/continue_subagent.rs
  • src/openhuman/agent_orchestration/tools/dispatch.rs
  • src/openhuman/agent_orchestration/tools/spawn_async_subagent.rs
  • src/openhuman/agent_orchestration/tools/spawn_subagent.rs

Comment thread src/openhuman/agent_orchestration/tools/continue_subagent.rs
Comment thread src/openhuman/agent/harness/engine/core_tests.rs Outdated
- continue_subagent: remove the stale AwaitingUser checkpoint file in the
  Incomplete arm too (best-effort), mirroring the Completed arm, so a resumed
  run that stops short again doesn't orphan the checkpoint on disk.
- core_tests: vary the narration per call in RepeatedCallProvider so the
  repeat-OUTPUT guard keeps resetting and only the repeat-CALL guard can trip —
  genuinely isolating the call breaker (the prior constant narration let the
  output guard accumulate too, just firing later).
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 27, 2026
… it on success

Addresses Codex P1 on tinyhumansai#4230: the repeat-call breaker halted any identical
(tool, args) batch before dispatch, including polling tools like wait_subagent
whose contract is to be re-invoked with identical args until the work finishes.
A task outliving two timeout windows would have its third wait_subagent halted
before collecting the result, misreporting a stuck turn.

- Add is_repeat_call_exempt (wait_subagent) and reset() on both no-progress
  guards; exempt all-poll batches from both breakers.
- Move the repeat-call breaker post-execution and gate it on success: identical
  *failing* calls now stay the failure guard's domain (with its per-class
  thresholds and richer halt message) instead of being preempted at 3. Restores
  run_tool_call_loop_halts_on_repeated_identical_failure.

Tests: polling-exemption unit + reset test, end-to-end polling_tool_is_exempt
(5 identical wait_subagent polls finish normally), and the tinyhumansai#4095 success-loop
test still halts at the threshold.
@coderabbitai coderabbitai Bot added bug and removed feature Net-new user-facing capability or product behavior. labels Jun 27, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 27, 2026
…nd-incomplete-handback

# Conflicts:
#	src/openhuman/agent/harness/engine/core.rs
#	src/openhuman/agent/harness/subagent_runner/ops/runner.rs
…at-call breaker doesn't pre-empt the cap

The new repeat-call breaker halts identical (tool,args) loops at 3 reps, so
cap/checkpoint tests that reached max_iterations via an identical call now trip
the breaker first. Vary the call each turn (distinct args) so these tests still
exercise the iteration-cap path:

- channels IterativeToolProvider: embed the iteration index as an extra
  mock_price `step` arg (it already varied narration; the call itself was
  identical, which the narration-independent repeat-call guard caught).
- agent turn_emits_checkpoint_at_max_iterations: vary the echo message by i.
- run_tool_call_loop_returns_max_iterations_error: push varied forced responses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Harness] Escalate to orchestrator (Core) after repeated sub-agent failures [Harness] Add circuit breaker for repeated identical tool calls

1 participant