-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Adds early stopping support for Ray integration #3276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds early stopping support for Ray integration #3276
Conversation
|
Hi @garylvov, here is the PR as agreed. I have noticed while re-testing that it sometimes works, sometimes doesn't. I think the issue is not the mechanism inside the PR, but rather that Isaac Sim does not always respond in time to the termination signal. When this happens, the next training does not start within One solution idea could be to check if a subprocess exists within a threshold and kill it, if it doesn't, but I do not know how to do this with stoppers as normally Ray handles this. Alternatively, would it be better to add a mechanism to |
|
@garylvov, a small update: I was wrong to say it sometimes works. By coincidence, the processes were ending anyway shortly after the early stop signal, and that is why I thought it sometimes worked. I have debugged it further and can confirm that even after a "trial" is marked as completed, the subprocess/training continues to the end. The following trial might fail, e.g., due to a lack of GPU memory. |
|
Hi thanks for your further investigation.
I think this could work, but I would be a little worried about it being too "kill happy", and erroneously shutting down processes that were experiencing an ephemeral stalling period. Perhaps we can just wait for a few moments, and if it's still halted, then kill it. However, I think it may be non-optimal design to have a new ray process try to do cleanup of other processes before starting, as opposed to a ray process doing clean-up on its own process after it finishes.
I would assume that Ray could do this well enough out of the box to stop the rogue processes, but I guess that's wishful thinking ;) I will do some testing of this too. I think you may be onto something with some mechanism related to the Ray stopper. Maybe we can override some sort of cleanup method to aggressively SIGKILL the PID recovered by |
I agree. Nevertheless, I still wanted to try to kill it the following way: but it did not work. Did you manage to get a lead on what might work in a robust way? |
|
Hi @ozhanozen I have yet another workaround that may just work ;) I think we could add to the training scripts (RSL RL, RL Games, SKRL, etc) Then, for each job submitted in util, we can assign an extra PID. I haven't implemented it, but I've tested it manually with the following: Then, the following has proven to be effective at shutting down all processes. I think this will be effective, as for example, having a mismatched id like below doesn't kill the process Now we just need to add some sort of exit hook to the trials to kill the assigned ray proc id |
|
We could maybe use https://docs.ray.io/en/latest/tune/api/callbacks.html to have each trial clean itself up |
|
Actually, now that I've played with this more, I think it's important that we SIGKILL instead of SIGINT Maybe would work without the extra rid stuff |
|
I think it is better at killing old jobs, but it keeps on running into log extraction errors after terminating processes. Please let me know if you can figure out why! |
|
Hi @garylvov, I had just the opportunity to check this; sorry for the delay. I had encountered two different issues (one regarding rl_games at #3531, one about ray initialization at #3533), which might be resulting in the log extraction errors you faced. If I combine the fixes on both these PRs with the current PR, I am able to successfully do early stopping without encountering any further errors. Could you also try again with these? Note: You had also used |
|
Awesome, thank you so much for figuring it out! Sounds good about the I'm excited to try this soon! |
garylvov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kellyguo11,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds optional early stopping support to IsaacLab's Ray hyperparameter tuning integration. The implementation allows users to define custom Stopper classes (following Ray's tune.Stopper interface) to terminate underperforming trials early, reducing wasted compute time. The core changes include: adding a --ray-proc-id argument to all training scripts (rl_games, rsl_rl, sb3, skrl) for process identification; creating a ProcessCleanupCallback in tuner.py that forcibly terminates trial processes using SIGKILL; combining custom stoppers with the existing LogExtractionErrorStopper via CombinedStopper; and providing a reference implementation (CartpoleEarlyStopper) that stops trials when cart out-of-bounds exceeds 85% after 20 iterations. The feature integrates with IsaacLab's existing Ray workflow infrastructure (tuner.py, util.py) while maintaining backward compatibility—when no stopper is specified, behavior remains unchanged.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| scripts/reinforcement_learning/ray/tuner.py | 3/5 | Implements core early stopping logic with ProcessCleanupCallback using SIGKILL, combines custom stoppers with error stopper, adds RID-based process tracking, and increases MAX_LOG_EXTRACTION_ERRORS from 2 to 10 |
| scripts/reinforcement_learning/ray/hyperparameter_tuning/vision_cartpole_cfg.py | 4/5 | Adds CartpoleEarlyStopper reference implementation that stops trials when out_of_bounds metric exceeds 0.85 after 20 iterations |
| scripts/reinforcement_learning/rl_games/train.py | 5/5 | Adds --ray-proc-id argument to enable Ray process identification for early stopping |
| scripts/reinforcement_learning/rsl_rl/train.py | 4/5 | Adds --ray-proc-id argument but doesn't use it within the script itself |
| scripts/reinforcement_learning/sb3/train.py | 4/5 | Adds --ray-proc-id argument but doesn't use it within the script itself |
| scripts/reinforcement_learning/skrl/train.py | 5/5 | Adds --ray-proc-id argument to enable Ray process identification for early stopping |
| scripts/reinforcement_learning/ray/util.py | 5/5 | Extends argument processing to support both single-dash and double-dash argument prefixes |
Confidence score: 3/5
- This PR introduces complex process management that requires careful testing, particularly around the aggressive SIGKILL cleanup mechanism and the interaction between multiple trials running concurrently.
- Score reduced due to: (1) aggressive process cleanup using
pkill -9which may leave GPU resources or simulation state in inconsistent states; (2) reliance on string matching for process identification (matching "rid {RID}" in command line) which could be fragile if command formatting changes; (3) the --ray-proc-id argument is added to training scripts but never actually used within them, requiring verification that the RID is properly passed through to subprocesses for pkill to match; (4) increased MAX_LOG_EXTRACTION_ERRORS from 2 to 10 suggests potential cleanup side effects that warrant monitoring; (5) no tests added despite introducing critical process lifecycle management logic. - Pay close attention to scripts/reinforcement_learning/ray/tuner.py, particularly the ProcessCleanupCallback implementation and the RID-based process matching logic, as these are most likely to cause issues with orphaned processes or premature trial termination.
Sequence Diagram
sequenceDiagram
participant User
participant tuner.py
participant CartpoleEarlyStopper
participant CombinedStopper
participant IsaacLabTuneTrainable
participant util.execute_job
participant train.py
participant Tensorboard
User->>tuner.py: Run tuner with --stopper CartpoleEarlyStopper
tuner.py->>CartpoleEarlyStopper: Instantiate stopper
tuner.py->>CombinedStopper: Create with LogExtractionErrorStopper + CartpoleEarlyStopper
tuner.py->>IsaacLabTuneTrainable: Create trainable instances
loop For each trial
IsaacLabTuneTrainable->>util.execute_job: Start training process
util.execute_job->>train.py: Launch training workflow
train.py->>Tensorboard: Write metrics to logs
loop Training steps
IsaacLabTuneTrainable->>Tensorboard: Load tensorboard logs
Tensorboard-->>IsaacLabTuneTrainable: Return metrics dict
IsaacLabTuneTrainable->>tuner.py: Report result with metrics
tuner.py->>CombinedStopper: Check stop conditions
CombinedStopper->>CartpoleEarlyStopper: __call__(trial_id, result)
CartpoleEarlyStopper->>CartpoleEarlyStopper: Check iter >= 20 and out_of_bounds > 0.85
alt Should stop trial
CartpoleEarlyStopper->>CartpoleEarlyStopper: Add trial_id to _bad_trials
CartpoleEarlyStopper-->>CombinedStopper: Return True
CombinedStopper-->>tuner.py: Stop trial
tuner.py->>ProcessCleanupCallback: Cleanup trial process
ProcessCleanupCallback->>util.execute_job: pkill training process
else Continue trial
CartpoleEarlyStopper-->>CombinedStopper: Return False
CombinedStopper-->>tuner.py: Continue trial
end
end
end
tuner.py->>User: Training complete
7 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR introduces optional early stopping for Ray hyperparameter tuning through custom Stopper classes, allowing trials to terminate early when they're unlikely to yield useful results.
Key changes:
- Added
--stopperCLI argument totuner.pyfor loading custom stopper classes - Integrated stoppers with
CombinedStopperalongside existingLogExtractionErrorStopper - Implemented
ProcessCleanupCallbackto kill trial processes usingpkillwith RID-based pattern matching - Added
-rid(ray-proc-id) argument to all training scripts for process identification - Updated
util.pyto support single-dash argument format (-rid) - Included
CartpoleEarlyStopperexample that stops trials with high out_of_bounds rates - Increased
MAX_LOG_EXTRACTION_ERRORSfrom 2 to 10
Critical issue found:
- The
pkillpattern inProcessCleanupCallbackuses"rid {value}"but the actual command line uses"-rid {value}", causing process cleanup to fail
Confidence Score: 2/5
- This PR has a critical bug in process cleanup that will prevent early stopping from working correctly.
- The pkill pattern bug in
ProcessCleanupCallback._cleanup_trial(line 223) will prevent proper process termination when trials are stopped early. The pattern searches for "rid {value}" but processes will have "-rid {value}" in their command line. This means stopped trials won't be cleaned up properly, potentially leaving zombie processes and wasting resources - the opposite of what early stopping aims to achieve. Once this critical bug is fixed, the implementation is otherwise sound. - Pay close attention to
scripts/reinforcement_learning/ray/tuner.py- the process cleanup logic must be fixed before merge.
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| scripts/reinforcement_learning/ray/tuner.py | 3/5 | Added early stopping support with custom stopper integration, process cleanup callback, and RID-based process tracking. Contains critical pkill pattern bug. |
| scripts/reinforcement_learning/ray/hyperparameter_tuning/vision_cartpole_cfg.py | 4/5 | Added example CartpoleEarlyStopper demonstrating early stopping based on out_of_bounds metric. Hardcoded thresholds should be configurable. |
| scripts/reinforcement_learning/ray/util.py | 5/5 | Updated argument parsing to support -rid flag in addition to -- prefixed arguments. |
Sequence Diagram
sequenceDiagram
participant User
participant Tuner as tuner.py
participant Ray as Ray Tune
participant Stopper as Custom Stopper
participant Trainable as IsaacLabTuneTrainable
participant Training as train.py
participant Callback as ProcessCleanupCallback
User->>Tuner: Start tuning with --stopper flag
Tuner->>Tuner: Load custom stopper class
Tuner->>Tuner: Instantiate stopper
Tuner->>Tuner: Create CombinedStopper with LogExtractionErrorStopper
Tuner->>Tuner: Generate random RID for trial
Tuner->>Ray: Configure Tuner with stopper & callback
Ray->>Trainable: Start trial with config (includes RID)
Trainable->>Training: Execute train.py with -rid parameter
Training-->>Trainable: Start training process
loop Every step
Trainable->>Training: Read tensorboard logs
Training-->>Trainable: Return metrics
Trainable->>Ray: Report metrics
Ray->>Stopper: Check stopping criteria
Stopper-->>Ray: Return stop decision
alt Should stop trial
Ray->>Trainable: Stop trial
Ray->>Callback: on_trial_complete/on_trial_error
Callback->>Callback: pkill -9 -f "rid {RID}"
Callback-->>Ray: Cleanup complete
end
end
Ray-->>User: Return tuning results
7 files reviewed, 5 comments
scripts/reinforcement_learning/ray/hyperparameter_tuning/vision_cartpole_cfg.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR introduces early stopping support for Ray-based hyperparameter tuning, allowing trials to terminate early when they show poor performance. The implementation includes a ProcessCleanupCallback that forcefully terminates trial processes using pkill, a CombinedStopper mechanism that merges the existing LogExtractionErrorStopper with optional custom stoppers, and an example CartpoleEarlyStopper that demonstrates stopping trials based on performance metrics.
Key changes:
- Added
--stopperCLI argument to optionally load custom stopper classes from config files - Implemented
ProcessCleanupCallbackto clean up processes when trials complete or error out - Integrated stoppers using Ray's
CombinedStopperto combine multiple stop conditions - Generated unique random RID (ray process ID) for each trial to enable process-specific cleanup
- Updated
util.pyto support both--argand-argcommand-line argument formats - Added
-ridargument to all training scripts (rl_games, rsl_rl, sb3, skrl) for process identification - Increased
MAX_LOG_EXTRACTION_ERRORSfrom 2 to 10 to be more tolerant of transient log extraction issues
The implementation addresses issue #3270 by enabling efficient hyperparameter tuning where poorly-performing trials can be terminated early rather than running to completion.
Confidence Score: 4/5
- This PR is safe to merge with minor risks related to process cleanup reliability
- The implementation is mostly sound with a well-designed architecture for early stopping. However, there's one confirmed bug in the process cleanup pattern that prevents processes from being killed properly (pkill pattern mismatch), and some style improvements could be made. The core functionality of the stopper integration is correct, and the changes to training scripts are minimal and safe.
- scripts/reinforcement_learning/ray/tuner.py requires attention due to the pkill pattern bug that prevents proper process cleanup
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| scripts/reinforcement_learning/ray/tuner.py | 4/5 | Added early stopping support with stopper integration, process cleanup callback, and random RID generation for trial cleanup |
| scripts/reinforcement_learning/ray/util.py | 5/5 | Fixed command argument parsing to support both --arg and -arg formats for RID parameter |
| scripts/reinforcement_learning/ray/hyperparameter_tuning/vision_cartpole_cfg.py | 4/5 | Added CartpoleEarlyStopper example that stops trials when out_of_bounds metric exceeds threshold |
Sequence Diagram
sequenceDiagram
participant User
participant Tuner
participant Ray
participant IsaacLabTuneTrainable
participant TrainingProcess
participant Stopper
participant ProcessCleanupCallback
User->>Tuner: Start tuning with --stopper arg
Tuner->>Tuner: Load stopper class from config
Tuner->>Tuner: Generate random RID for each trial
Tuner->>Ray: Initialize with CombinedStopper
Note over Ray: CombinedStopper includes LogExtractionErrorStopper + custom stopper
loop For each trial
Ray->>IsaacLabTuneTrainable: Start trial with config (includes RID)
IsaacLabTuneTrainable->>TrainingProcess: Execute training command with -rid arg
TrainingProcess-->>IsaacLabTuneTrainable: Stream metrics via tensorboard logs
loop During training
IsaacLabTuneTrainable->>IsaacLabTuneTrainable: Load tensorboard metrics
IsaacLabTuneTrainable->>Ray: Report metrics
Ray->>Stopper: Check if trial should stop
Stopper-->>Ray: Return stop decision
alt Trial should stop
Ray->>IsaacLabTuneTrainable: Stop trial
Ray->>ProcessCleanupCallback: on_trial_complete/on_trial_error
ProcessCleanupCallback->>ProcessCleanupCallback: pkill -9 -f "rid {RID}"
ProcessCleanupCallback->>TrainingProcess: Kill process
end
end
end
Ray-->>User: Return tuning results
7 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds early stopping support to Ray hyperparameter tuning through the Stopper class integration. The implementation allows trials to terminate early when they're unlikely to yield useful results, improving computational efficiency.
Key Changes:
- Adds
ProcessCleanupCallbackto properly terminate trial processes when stopped - Integrates custom stoppers with
CombinedStopperalongside existingLogExtractionErrorStopper - Adds
--ray-proc-id(-rid) argument to all training scripts for process identification - Updates
util.pyto handle short-form arguments starting with single dash - Includes
CartpoleEarlyStopperas a reference implementation example - Increases
MAX_LOG_EXTRACTION_ERRORSfrom 2 to 10
Issues Found:
- The
pkillpattern inProcessCleanupCallbackusesrid {value}instead of-rid {value}, which works but is less precise and could cause false matches - The stopper validation only accepts uninstantiated classes, which may be limiting if users want to pass pre-configured stopper instances
Confidence Score: 3/5
- This PR is mostly safe to merge with one process cleanup issue that should be fixed
- The early stopping implementation is well-structured and follows Ray's patterns. However, the pkill pattern bug could cause issues in production with false positive matches, and the 5-second blocking sleep may impact performance during high-throughput trial scheduling. The core logic is sound and the feature addresses a real need.
- Pay close attention to
scripts/reinforcement_learning/ray/tuner.py- the ProcessCleanupCallback needs the pkill pattern fixed
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| scripts/reinforcement_learning/ray/tuner.py | 2/5 | Adds early stopping support with ProcessCleanupCallback and stopper integration; contains critical pkill pattern bug and minor validation issue |
| scripts/reinforcement_learning/ray/hyperparameter_tuning/vision_cartpole_cfg.py | 4/5 | Adds CartpoleEarlyStopper example implementation with hardcoded threshold values |
| scripts/reinforcement_learning/ray/util.py | 5/5 | Updates command parsing to handle short-form arguments starting with single dash |
Sequence Diagram
sequenceDiagram
participant User
participant Tuner
participant Ray
participant IsaacLabTuneTrainable
participant TrainingProcess
participant Stopper
participant ProcessCleanupCallback
User->>Tuner: Start tuning with optional stopper
Tuner->>Tuner: Load config and stopper class
Tuner->>Stopper: Instantiate stopper()
Tuner->>Tuner: Create CombinedStopper with LogExtractionErrorStopper + custom stopper
Tuner->>Ray: Configure tuner with stoppers and callbacks
Ray->>IsaacLabTuneTrainable: Schedule trial with RID
IsaacLabTuneTrainable->>TrainingProcess: Execute training command with -rid
loop Every step
IsaacLabTuneTrainable->>TrainingProcess: Read tensorboard logs
TrainingProcess-->>IsaacLabTuneTrainable: Return metrics
IsaacLabTuneTrainable->>Ray: Report metrics
Ray->>Stopper: Check if trial should stop
alt Should stop
Stopper-->>Ray: Return True
Ray->>ProcessCleanupCallback: on_trial_complete/on_trial_error
ProcessCleanupCallback->>TrainingProcess: pkill -9 -f "rid {RID}"
ProcessCleanupCallback->>ProcessCleanupCallback: sleep(5)
else Continue
Stopper-->>Ray: Return False
end
end
Ray-->>Tuner: All trials complete
Tuner-->>User: Results
7 files reviewed, no comments
| ) | ||
| parser.add_argument("--export_io_descriptors", action="store_true", default=False, help="Export IO descriptors.") | ||
| parser.add_argument( | ||
| "--ray-proc-id", "-rid", type=int, default=None, help="Automatically configured by Ray integration, otherwise None." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if a user sets this by accident? if it's meant to be configured directly by ray, does it have to be exposed as a user-level command line argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user sets it by accident, nothing different will happen than if the user didn't set it. This is a no-op in terms of functionality outside of the Ray stuff.
The reason this is added here is because one single ./isaaclab.sh -p train.py starts many processes, all with different PIDs. For example, the following shows many PIDs associated with one single isaac run:

When using python's subprocess to recover the PID, only one PID is returned out of the many subprocesses created by a single Isaac train command, and so it's hard to completely kill a run programmatically. This rid is used to correlate what group of subprocesses relate to a single Isaac lab run, so that a single isaac lab run can be pkilled by it's unique RID, without affecting unrelated runs on the same machine with a different RID. This is critical to our early stopping functionality; without it, we couldn't find a good way to make sure the run was killed, resulting in hanging blocking processes on the GPU. For more context, see: #3276 (comment) and #3276 (comment)
I tried to configure this without rid (using the isaac command itself as a unique identifier, or starting the command with RAY_PROC_ID= in a bash like way so that RID isn't exposed to the user), but I couldn't get it to work. Although the RID is not as clean as it could be, I think using another method to ensure runs are killed would require a significant refactor in the training scripts (mentioned in #3546 (comment)), or significantly more investigation into how to best enable early stopping

Description
This PR introduces support for early stopping in Ray integration through the
Stopperclass. It enables trials to end sooner when they are unlikely to yield useful results, reducing wasted compute time and speeding up experimentation.Previously, when running hyperparameter tuning with Ray integration, all trials would continue until the training configuration’s maximum iterations were reached, even if a trial was clearly underperforming. This wasn’t always efficient, since poor-performing trials could often be identified early on. With this PR, an optional early stopping mechanism is introduced, allowing Ray to terminate unpromising trials sooner and improve the overall efficiency of hyperparameter tuning.
The PR also includes a
CartpoleEarlyStopperexample invision_cartpole_cfg.py. This serves as a reference implementation that halts a trial if theout_of_boundsmetric doesn’t reduce after a set number of iterations. It’s meant as a usage example: users are encouraged to create their own custom stoppers tailored to their specific use cases.Fixes #3270.
Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there