Skip to content

Stable versions of torchrl/tensordict still getting internal dynamo error #14

@StoneT2000

Description

@StoneT2000

I am currently getting the same issue in #10.

I have torch 2.5.1, torchrl 0.6.0, tensordict 0.6.0 at the moment. I am running a slightly modified version of the original code. I can run with cudagraphs or compile, but not both. Although with cudagraphs things are working great!

Trace:

python leanrl/ppo_continuous_action_torchcompile.py --num-envs 1 --num-steps 64 --total-timesteps 256 --compile --cudagraphs
/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/tyro/_fields.py:181: UserWarning: The field target_kl is annotated with type <class 'float'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
  warnings.warn(
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: stonet2000. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.18.5
wandb: Run data is saved locally in /home/stao/work/external/leanrl/wandb/run-20241030_192316-jgwpim8y
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run ppo_continuous_action_torchcompile-HalfCheetah-v4__ppo_continuous_action_torchcompile__1__True__True
wandb: ⭐️ View project at https://wandb.ai/stonet2000/ppo_continuous_action
wandb: 🚀 View run at https://wandb.ai/stonet2000/ppo_continuous_action/runs/jgwpim8y
/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/tensordict/nn/cudagraphs.py:194: UserWarning: Tensordict is registered in PyTree. This is incompatible with CudaGraphModule. Removing TDs from PyTree. To silence this warning, call tensordict.nn.functional_module._exclude_td_from_pytree().set() or set the environment variable `EXCLUDE_TD_FROM_PYTREE=1`. This operation is irreversible.
  warnings.warn(
  0%|                                                                                                                                 | 0/4 [00:00<?, ?it/s]/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:167: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
W1030 19:23:26.159230 2648226 site-packages/torch/_logging/_internal.py:1081] [11/0] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
 25%|██████████████████████████████▎                                                                                          | 1/4 [00:10<00:31, 10.50s/it]/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/cuda/graphs.py:84: UserWarning: The CUDA Graph is empty. This usually means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ../aten/src/ATen/cuda/CUDAGraph.cpp:208.)
  super().capture_end()
 25%|██████████████████████████████▎                                                                                          | 1/4 [00:10<00:31, 10.65s/it]
Traceback (most recent call last):
  File "/home/stao/work/external/leanrl/leanrl/ppo_continuous_action_torchcompile.py", line 358, in <module>
    container = gae(next_obs, next_done, container)
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/tensordict/nn/cudagraphs.py", line 439, in __call__
    return self._call_func(*args, **kwargs)
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/tensordict/nn/cudagraphs.py", line 345, in _call
    out = self.module(*self._args, **self._kwargs)
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn
    return fn(*args, **kwargs)
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1269, in __call__
    return self._torchdynamo_orig_callable(
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 526, in __call__
    return _compile(
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 952, in _compile
    raise InternalTorchDynamoError(
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 924, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 666, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_utils_internal.py", line 87, in wrapper_function
    return function(*args, **kwargs)
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 699, in _compile_inner
    out_code = transform_code_object(code, transform)
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1322, in transform_code_object
    transformations(instructions, code_options)
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 208, in _fn
    cuda_rng_state = torch.cuda.get_rng_state()
  File "/home/stao/miniforge3/envs/leanrl/lib/python3.10/site-packages/torch/cuda/random.py", line 42, in get_rng_state
    return default_generator.get_state()
torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: Cannot call CUDAGeneratorImpl::current_seed during CUDA graph capture. If you need this call to be captured, please file an issue. Current cudaStreamCaptureStatus: cudaStreamCaptureStatusActive


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

currently trying to add leanrl tricks to maniskill, update times have massively improved!
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions