Skip to content

feat: add --prompt-retries for automatic retry on transient errors#142

Open
lupuletic wants to merge 1 commit intoopenclaw:mainfrom
lupuletic:fix/prompt-retry-on-transient-errors
Open

feat: add --prompt-retries for automatic retry on transient errors#142
lupuletic wants to merge 1 commit intoopenclaw:mainfrom
lupuletic:fix/prompt-retry-on-transient-errors

Conversation

@lupuletic
Copy link

Summary

  • Add --prompt-retries <count> CLI flag to automatically retry failed prompt turns on transient ACP errors
  • Add isRetryablePromptError() to classify transient vs permanent errors — retries only on ACP -32603 (internal error) and -32700 (parse error)
  • Implement exponential backoff: 1s, 2s, 4s, 8s, capped at 10s
  • Never retry auth, permission, timeout, or session-not-found errors
  • Skip retry when the agent process crashed during the prompt
  • Default: 0 retries (preserving existing behavior)

Closes #137

How it works

When a model API returns a transient error (e.g. HTTP 400/500), the ACP agent wraps it as a JSON-RPC -32603 internal error. With --prompt-retries 3, acpx will:

  1. Detect the error is transient via isRetryablePromptError()
  2. Wait with exponential backoff (1s → 2s → 4s, capped at 10s)
  3. Re-send the prompt to the still-running agent
  4. Log each retry attempt to stderr
# Enable 3 retries for flaky model APIs
acpx --prompt-retries 3 claude prompt "Hello"

# Also works with exec mode
acpx --prompt-retries 2 claude exec "Analyze this code"

Test plan

  • 13 new unit tests for isRetryablePromptError() covering all error categories
  • Full test suite passing (263 tests, 0 failures)
  • TypeScript type check clean
  • Linting + formatting clean
  • Pre-commit hooks pass

Note: pnpm run check fails at the repo-wide coverage gate due to pre-existing baseline coverage totals unrelated to this change (same issue noted in PR #128).

AI-assisted PR

  • Marked as AI-assisted
  • Fully tested locally (pnpm build && pnpm test && pnpm lint)
  • I understand what the code does

🤖 Generated with Claude Code

When model APIs return transient errors (HTTP 400/500 surfacing as ACP
internal errors), acpx now supports automatic retry with exponential
backoff via the --prompt-retries flag.

- Add isRetryablePromptError() to classify transient vs permanent errors
- Retry only on ACP -32603 (internal) and -32700 (parse) errors
- Never retry auth, permission, timeout, or session-not-found errors
- Exponential backoff: 1s, 2s, 4s, 8s, capped at 10s
- Skip retry if agent process crashed during prompt
- Flow --prompt-retries through CLI → queue owner → prompt execution
- Default: 0 (no retries, preserving existing behavior)

Closes openclaw#137

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

It is necessary to automatically retry the function when the model api returns 400.

2 participants