-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
I am currently getting the same issue in #10.
I have torch 2.5.1, torchrl 0.6.0, tensordict 0.6.0 at the moment. I am running a slightly modified version of the original code. I can run with cudagraphs or compile, but not both. Although with cudagraphs things are working great!
Trace:
python leanrl/ppo_continuous_action_torchcompile.py --num-envs 1 --num-steps 64 --total-timesteps 256 --compile --cudagraphs
/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/tyro/_fields.py:181: UserWarning: The field target_kl is annotated with type <class 'float'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
warnings.warn(
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: stonet2000. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.18.5
wandb: Run data is saved locally in /home/stao/work/external/leanrl/wandb/run-20241030_192316-jgwpim8y
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run ppo_continuous_action_torchcompile-HalfCheetah-v4__ppo_continuous_action_torchcompile__1__True__True
wandb: ⭐️ View project at https://wandb.ai/stonet2000/ppo_continuous_action
wandb: 🚀 View run at https://wandb.ai/stonet2000/ppo_continuous_action/runs/jgwpim8y
/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/tensordict/nn/cudagraphs.py:194: UserWarning: Tensordict is registered in PyTree. This is incompatible with CudaGraphModule. Removing TDs from PyTree. To silence this warning, call tensordict.nn.functional_module._exclude_td_from_pytree().set() or set the environment variable `EXCLUDE_TD_FROM_PYTREE=1`. This operation is irreversible.
warnings.warn(
0%| | 0/4 [00:00<?, ?it/s]/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:167: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
warnings.warn(
W1030 19:23:26.159230 2648226 site-packages/torch/_logging/_internal.py:1081] [11/0] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
25%|██████████████████████████████▎ | 1/4 [00:10<00:31, 10.50s/it]/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/cuda/graphs.py:84: UserWarning: The CUDA Graph is empty. This usually means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ../aten/src/ATen/cuda/CUDAGraph.cpp:208.)
super().capture_end()
25%|██████████████████████████████▎ | 1/4 [00:10<00:31, 10.65s/it]
Traceback (most recent call last):
File "/home/stao/work/external/leanrl/leanrl/ppo_continuous_action_torchcompile.py", line 358, in <module>
container = gae(next_obs, next_done, container)
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/tensordict/nn/cudagraphs.py", line 439, in __call__
return self._call_func(*args, **kwargs)
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/tensordict/nn/cudagraphs.py", line 345, in _call
out = self.module(*self._args, **self._kwargs)
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn
return fn(*args, **kwargs)
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1269, in __call__
return self._torchdynamo_orig_callable(
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 526, in __call__
return _compile(
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 952, in _compile
raise InternalTorchDynamoError(
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 924, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 666, in compile_inner
return _compile_inner(code, one_graph, hooks, transform)
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_utils_internal.py", line 87, in wrapper_function
return function(*args, **kwargs)
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 699, in _compile_inner
out_code = transform_code_object(code, transform)
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1322, in transform_code_object
transformations(instructions, code_options)
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 208, in _fn
cuda_rng_state = torch.cuda.get_rng_state()
File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/cuda/random.py", line 42, in get_rng_state
return default_generator.get_state()
torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: Cannot call CUDAGeneratorImpl::current_seed during CUDA graph capture. If you need this call to be captured, please file an issue. Current cudaStreamCaptureStatus: cudaStreamCaptureStatusActive
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
currently trying to add leanrl tricks to maniskill, update times have massively improved!

Metadata
Metadata
Assignees
Labels
No labels