-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Closed
Description
If you are submitting a bug report, please fill in the following details and use the tag [bug].
Describe the bug
I'm using ray feature to hpo with ASHAScheduler instead of default scheduler(FIFO). After stoping some bad trials and starting new trials, I found the old trials process doesn't exit correctly. So the gpu resource have not free.
Steps to reproduce
./isaaclab.sh -p scripts/reinforcement_learning/ray/tuner.py \
--cfg_file scripts/reinforcement_learning/ray/hyperparameter_tuning/vision_cartpole_cfg.py \
--cfg_class CartpoleTheiaJobCfg \
--run_mode local \
--workflow scripts/reinforcement_learning/rl_games/train.py \
--num_workers_per_node <NUMBER_OF_GPUS_IN_COMPUTER>
tuner = tune.Tuner(
IsaacLabTuneTrainable,
param_space=cfg,
tune_config=tune.TuneConfig(
metric=args.metric,
mode=args.mode,
scheduler=ASHAScheduler(time_attr="time_this_iter_s", grace_period=3,
max_t=5, reduction_factor=2),
search_alg=repeat_search,
num_samples=args.num_samples,
reuse_actors=True,
),
run_config=run_config,
)
System Info
Describe the characteristic of your environment:
- Commit: [e.g. 2f9298d]
- Isaac Sim Version: [5.0]
- OS: [Ubuntu 22.04]
- GPU: [RTX 4090]
- CUDA: [12.8]
- GPU Driver: [553.05]
Metadata
Metadata
Assignees
Labels
No labels