Skip to content

fix: enable agentic recovery for sub-recipe failures (#2953)#2967

Merged
rysweet merged 2 commits intomainfrom
fix/2953-enable-recovery-on-failure
Mar 9, 2026
Merged

fix: enable agentic recovery for sub-recipe failures (#2953)#2967
rysweet merged 2 commits intomainfrom
fix/2953-enable-recovery-on-failure

Conversation

@rysweet
Copy link
Owner

@rysweet rysweet commented Mar 9, 2026

Summary

  • Adds recovery_on_failure: true to both sub-recipe invocation steps in smart-orchestrator.yaml:
    • execute-single-round-1 (primary single-workstream execution)
    • execute-single-fallback-blocked (fallback when parallel spawning is blocked)

Problem

When default-workflow fails at an individual step (e.g., branch name too long per #2952), the smart-orchestrator reports the entire recipe as FAILED — even when the agent within the sub-recipe actually recovered and completed all the work (commits, tests, push).

Fix

The Rust recipe runner already implements recovery_on_failure (runner.rs:712-762) — it invokes an agent to assess the failure and attempt recovery. The feature just wasn't enabled on the smart-orchestrator's sub-recipe steps.

With this change, when a sub-recipe fails:

  1. The runner captures which steps failed and which succeeded
  2. An agent evaluates whether recovery is possible
  3. If the agent completes the remaining work, the step is marked as recovered/completed
  4. If recovery fails, the original failure propagates as before

Companion PR

  • amplihack-recipe-runner PR: Adds recovery_on_failure and model to the parser validator's whitelist (fixes spurious "unrecognized field" warnings)

Test plan

  • Recipe validates cleanly with --validate-only (after companion PR)
  • All 95 recipe runner tests pass including recovery success/failure paths
  • Two-line change — minimal risk

Fixes #2953

🤖 Generated with Claude Code

Ubuntu and others added 2 commits March 9, 2026 03:33
…rator

Add recovery_on_failure: true to both sub-recipe invocation steps
(execute-single-round-1 and execute-single-fallback-blocked). When
default-workflow fails at an individual step but the agent actually
recovers and completes the work, the recipe runner now attempts agentic
recovery instead of reporting binary failure.

This addresses the scenario where e.g. branch name generation fails
(#2952) but the agent works around it on the current branch — previously
reported as FAILED despite all work being completed.

Relates to #2953

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

🤖 Auto-fixed version bump

The version in pyproject.toml has been automatically bumped to the next patch version.

If you need a minor or major version bump instead, please update pyproject.toml manually and push the change.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Repo Guardian - Passed

All files in this PR are durable configuration changes:

  • amplifier-bundle/recipes/smart-orchestrator.yaml: Feature flag configuration (permanent)
  • pyproject.toml: Version metadata (permanent)

No ephemeral content detected.

AI generated by Repo Guardian

@rysweet rysweet merged commit 98d5fc2 into main Mar 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

recipe runner: sub-recipe failures should attempt agentic recovery, not binary fail

1 participant