Skip to content

Fix-aborts #6604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Fix-aborts #6604

wants to merge 7 commits into from

Conversation

chezsmithy
Copy link
Contributor

@chezsmithy chezsmithy commented Jul 13, 2025

Description

Solving for the unexpected aborts being observed in Bedrock.ts in the streamChat function.

Fix Race Condition Between Stream Completion and Tool Call Execution

Problem Description

Fixed intermittent stream aborts and text corruption caused by a race condition between streamChat completion and tool call processing.

Root Cause

  • streamNormalInput immediately executed tool calls upon stream completion
  • streamUpdate actions containing tool call deltas were still processing in Redux's action queue
  • Tool calls executed before stream state fully settled, causing subsequent streams to fail
  • Result: AWS Bedrock stream error (ECONNRESET): aborted and concurrent stream text corruption

Symptoms

  • ❌ Stream aborts: AWS Bedrock stream error (ECONNRESET): aborted
  • ❌ Text corruption: "The power, double, anThe power,d triple methods..."
  • ❌ Intermittent failures (timing-dependent)
  • ✅ Tool calls executed successfully, but LLM continuation failed

Solutions Evaluated

Option A: useEffect Hook with Delays

useEffect(() => {
  if (!isStreaming && toolCallState?.status === "generated") {
    setTimeout(() => dispatch(callCurrentTool()), 300);
  }
}, [isStreaming, toolCallState]);

Rejected: Arbitrary delays, still timing-dependent, component-coupled

Option B: Promise-Based State Settlement

const settleState = async () => {
  ReactDOM.flushSync(() => dispatch(setInactive()));
  await new Promise(resolve => setTimeout(resolve, 0));
};

Rejected: Complex integration, still relies on timing assumptions

Option C: Redux Middleware ✅ Selected

const toolCallMiddleware: Middleware = (store) => (next) => (action) => {
  const result = next(action);
  if (action.type === "session/setInactive") {
    const toolCallState = selectCurrentToolCall(store.getState());
    if (toolCallState?.status === "generated") {
      store.dispatch(callCurrentTool());
    }
  }
  return result;
};

Why Redux Middleware?

  1. Deterministic - Executes exactly when Redux state transitions
  2. Zero Race Conditions - Runs after setInactive action is fully processed
  3. Component-Agnostic - Works regardless of component lifecycle
  4. Follows Redux Patterns - Leverages predictable action flow
  5. Debuggable - Visible in Redux DevTools
  6. Performant - No polling or timeouts

Implementation

Files Modified

  • Created: gui/src/redux/middleware/toolCallMiddleware.ts
  • Modified: gui/src/redux/store.ts - Added middleware to store
  • Modified: gui/src/redux/thunks/streamNormalInput.ts - Removed immediate tool execution
  • Removed: gui/src/hooks/useToolCallTrigger.ts - Replaced with middleware
  • Modified: gui/src/pages/gui/Chat.tsx - Removed hook usage

Before (Race Condition)

// streamNormalInput.ts
dispatch(setInactive());
const toolCallState = selectCurrentToolCall(getState());
if (toolCallState) {
  dispatch(callCurrentTool()); // ❌ Too early!
}

After (Middleware Solution)

// streamNormalInput.ts  
dispatch(setInactive()); // ✅ Clean completion

// toolCallMiddleware.ts
if (action.type === "session/setInactive") {
  const toolCallState = selectCurrentToolCall(state);
  if (toolCallState?.status === "generated") {
    store.dispatch(callCurrentTool()); // ✅ Perfect timing
  }
}

Results

  • ✅ Eliminated stream abort errors
  • ✅ Fixed text corruption from concurrent streams
  • ✅ Maintained all existing functionality
  • ✅ Improved code maintainability and debuggability
  • ✅ Zero timing dependencies or arbitrary delays

The middleware approach provides a robust, deterministic solution that ensures tool calls execute at exactly the right moment in the Redux state transition lifecycle.

Checklist

  • I've read the contributing guide
  • The relevant docs, if any, have been updated or created
  • The relevant tests, if any, have been updated or created

Screenshots

[ For visual changes, include screenshots. Screen recordings are particularly helpful, and appreciated! ]

Tests

I have not updated any tests yet. Would like this reviewed and then I will look at test options. I will attempt to resolve any failing tests.


Summary by cubic

Fixed race conditions in Bedrock streamChat that caused stream aborts and text corruption by moving tool call execution to Redux middleware.

  • Bug Fixes
    • Eliminated stream abort errors and text corruption by ensuring tool calls run only after stream completion.
    • Removed timing dependencies and arbitrary delays for more reliable tool execution.

@chezsmithy chezsmithy requested a review from a team as a code owner July 13, 2025 23:40
@chezsmithy chezsmithy requested review from Patrick-Erichsen and removed request for a team July 13, 2025 23:40
Copy link

netlify bot commented Jul 13, 2025

👷 Deploy request for continuedev pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 242888b

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jul 13, 2025
@chezsmithy
Copy link
Contributor Author

When tested locally, this change does resolve: #6588

@chezsmithy
Copy link
Contributor Author

@sestinj I think this should be considered a critical hotfix for pre-release and the main branch, if you agree to the solution. Without this in place I can only issue one set of calls at a time before an abort occurs and I have to manually request the chat should continue.

I suspect it's going to affect other providers. It's only getting caught in Bedrock.ts because of the extensive try/catch logic I think. It's likely going to exhibit other unexpected behavior or stalled streams/text corruption with other providers.

@chezsmithy
Copy link
Contributor Author

@RomneyDa I suspect this was caused by the series of PRs noted including this one: #6561

@Patrick-Erichsen
Copy link
Collaborator

Thanks for the detailed write up and fix!

cc @sestinj and @RomneyDa for thoughts on the hotfix. Implementation seems solid to me, middleware feels like a great solution

@github-project-automation github-project-automation bot moved this from Todo to In Progress in Issues and PRs Jul 14, 2025
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 14, 2025
@RomneyDa
Copy link
Collaborator

@chezsmithy most providers handle the abort 499 code gracefully, whether in streamSse or explicitly. Maybe Bedrock is missing this handling?

@RomneyDa
Copy link
Collaborator

can only issue one set of calls at a time before an abort occurs and I have to manually request the chat should continue

Just to clarify, what do you mean by one set of calls at a time in this context?

@chezsmithy
Copy link
Contributor Author

@RomneyDa so what I see is the error is getting caught here:

if (error instanceof Error) {

But the error isn't related to Bedrock, the error is related to the abortSignal getting aborted which causes the streaming to end.

The pattern I see causing the issue is:

  1. Ask the LLM to update a file.
  2. The tool call occurs.
  3. The file updated succeeds using the new search and replace feature.
  4. The toolCall returns and sends the results back to the LLM.
  5. An abort occurs caught at the line above.

OR

  1. Ask the LLM to run tests.
  2. The tool call occurs.
  3. The tool call succeeds super quickly running the terminal command.
  4. The toolCall returns and sends the resutls back to the LLM.
  5. An abort occurs caught at the line above.

I've also observed that periodically the assistent reponse from the toolCall becaomes garbled and is rendered out of order in the chat window.

I'm thinking its a race condition that's being uncovered by the performance of the local tool calling. (guessing...)

@chezsmithy
Copy link
Contributor Author

I created a local build using this fix, and I can confirm it's 100% fixed the issue for me. No more aborts, and no more text corruption.

@RomneyDa
Copy link
Collaborator

I'd say go for it, seems like the bug is a bit opaque but if this works I'd agree seems urgent

Patrick-Erichsen and others added 2 commits July 15, 2025 11:28
…ware

- Removed handleToolCallExecution from streamNormalInput.ts
- Updated toolCallMiddleware to handle parallel tool calls with allowedWithoutPermission
- Maintained original behavior where only allowedWithoutPermission tools execute automatically
- Fixed all import dependencies and build issues

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jul 15, 2025
@Patrick-Erichsen
Copy link
Collaborator

Patrick-Erichsen commented Jul 15, 2025

@chezsmithy I ended up merging my parallel tool calling PR into main, merged those changes in here and resolved the conflicts, mind taking a peek at 97b01b1 and verifying?

Good to merge with your approval and will make it into pre-release tonight 🚀

@chezsmithy
Copy link
Contributor Author

chezsmithy commented Jul 15, 2025

@chezsmithy I ended up merging my parallel tool calling PR into main, merged those changes in here and resolved the conflicts, mind taking a peek at 97b01b1 and verifying?

Good to merge with your approval and will make it into pre-release tonight 🚀

@Patrick-Erichsen Testing now. Stand by.

@chezsmithy
Copy link
Contributor Author

chezsmithy commented Jul 15, 2025

@Patrick-Erichsen My review got me to thinking... is this the actual solution? #6644

Seems to resolve the issue for me locally. Running tests now.

@chezsmithy
Copy link
Contributor Author

Gonna back burner this for the simple option for now. Gonna close it, and reopen if needed.

@chezsmithy chezsmithy closed this Jul 16, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Issues and PRs Jul 16, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Jul 16, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants