Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation #54

Open
dangne opened this issue Oct 17, 2020 · 3 comments

Comments

@dangne
Copy link

dangne commented Oct 17, 2020

I encountered this strange error. Here is the output

$ python main.py 
2020-10-17 06:19:37,971:INFO::[*] Make directories : logs/ptb_2020-10-17_06-19-37
2020-10-17 06:19:45,686:INFO::regularizing:
2020-10-17 06:19:56,858:INFO::# of parameters: 146,014,000
2020-10-17 06:19:57,208:INFO::[*] MODEL dir: logs/ptb_2020-10-17_06-19-37
2020-10-17 06:19:57,208:INFO::[*] PARAM path: logs/ptb_2020-10-17_06-19-37/params.json
/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
2020-10-17 06:19:57,872:INFO::max hidden 3.5992980003356934
2020-10-17 06:19:58,043:INFO::abs max grad 0
/home/ubuntu/ENAS-pytorch/trainer.py:323: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  self.args.shared_grad_clip)
2020-10-17 06:19:58,879:INFO::abs max grad 0.05615033581852913
2020-10-17 06:19:59,448:INFO::max hidden 9.425106048583984
2020-10-17 06:19:59,774:INFO::abs max grad 0.0575626865029335
2020-10-17 06:20:01,810:INFO::abs max grad 0.12187317758798599
2020-10-17 06:20:03,771:INFO::abs max grad 0.5459710359573364
2020-10-17 06:20:07,741:INFO::max hidden 15.914213180541992
2020-10-17 06:20:17,945:INFO::abs max grad 0.8663018941879272
2020-10-17 06:20:41,948:INFO::| epoch   0 | lr 20.00 | raw loss 8.39 | loss 8.39 | ppl  4402.23
2020-10-17 06:21:21,796:INFO::| epoch   0 | lr 20.00 | raw loss 7.20 | loss 7.20 | ppl  1343.73
2020-10-17 06:21:26,601:INFO::max hidden 20.534639358520508
2020-10-17 06:22:06,855:INFO::| epoch   0 | lr 20.00 | raw loss 7.00 | loss 7.00 | ppl  1093.28
2020-10-17 06:22:07,417:INFO::max hidden 22.71334457397461
2020-10-17 06:22:19,596:INFO::clipped 1 hidden states in one forward pass. max clipped hidden state norm: 25.37160301208496
Traceback (most recent call last):
  File "main.py", line 54, in <module>
    main(args)
  File "main.py", line 34, in main
    trnr.train()
  File "/home/ubuntu/ENAS-pytorch/trainer.py", line 222, in train
    self.train_shared(dag=dag)
  File "/home/ubuntu/ENAS-pytorch/trainer.py", line 313, in train_shared
    loss.backward()
  File "/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 1000]], which is output 0 of AddBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
@lyjzsyzlt
Copy link

do you solve this problem? i meet the same issue.

@STONEKONG
Copy link

When used torch 1.7.1, I meet this issue. I solved this issue by reducing the vision of torch.

@david90103
Copy link

david90103 commented Feb 20, 2021

According to this discussion, some autograd bugs exist in older versions which did not detect in-place operations that are not valid correctly.

After changing the in-place operations below, the code works fine for me when running RNN model and torch 1.8.0.

https://github.com/carpedm20/ENAS-pytorch/blob/master/models/shared_rnn.py#L248

clipped_num += 1 to clipped_num = clipped_num + 1

and

hidden *= torch.autograd.Variable(torch.FloatTensor(mask).cuda(), requires_grad=False) to
hidden = hidden * torch.autograd.Variable(torch.FloatTensor(mask).cuda(), requires_grad=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants