-
Notifications
You must be signed in to change notification settings - Fork 626
Python: prevent re-execution when resuming from completed checkpoint #1742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug where resuming a workflow from an already-completed checkpoint would unnecessarily re-execute workflow logic and emit events. The fix adds early-exit checks to prevent execution when there's no work to do.
Key changes:
- Added message-checking logic before workflow iteration loops to avoid running when no messages exist
- Modified checkpoint resumption to detect already-completed workflows and return immediately
- Added comprehensive unit tests validating the fix
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
test_checkpoint_resume_completed.py |
New test file with comprehensive coverage of checkpoint resume scenarios |
_workflow.py |
Added early-exit logic when resuming from completed checkpoints; restructured checkpoint restoration |
_runner.py |
Added message check before each iteration to prevent unnecessary loops |
| # Check if workflow is already complete | ||
| # Only return early if checkpoint had NO messages to begin with | ||
| if not had_messages_before and not await self._runner.context.has_messages(): | ||
| # Return empty result - workflow was already complete |
Copilot
AI
Oct 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The empty WorkflowRunResult([], []) doesn't preserve any status events. If the workflow was previously completed, there might be value in returning the final status from the checkpoint. Consider whether the return value should include the completion status or document why an empty result is appropriate.
| # Return empty result - workflow was already complete | |
| # Return result with final status event if available | |
| final_status = self._runner.context.get_status() if hasattr(self._runner.context, "get_status") else None | |
| if final_status is not None: | |
| status_event = WorkflowStatusEvent(final_status) | |
| return WorkflowRunResult([], [status_event]) |
Python Test Coverage Report •
Python Unit Test Overview
|
|||||||||||||||||||||||||||||||||||
| logger.info("Skipping 'after_initial_execution' checkpoint because we resumed from a checkpoint") | ||
|
|
||
| while self._iteration < self._max_iterations: | ||
| # Check if there are any messages to process before starting iteration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this prevent the workflow from ever getting started because there may not be messages in the queue at the beginning?
Motivation and Context
When resuming a workflow from a checkpoint with 0 messages (already completed), the workflow would incorrectly:
WorkflowStartedEventand status eventsDescription
Added early exit logic in two places:
The fix distinguishes between:
Changes:
_runner.py: Added message check before iteration loop_workflow.py: Moved checkpoint restoration inline and added completion checktest_checkpoint_resume_completed.pyContribution Checklist