-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some bugs #35
Comments
Thank you for your interest in our work. Since this is an error caused by CUDA, I don't know how to fix it. |
Thanks for your reply.I will try to use the same version of python and pytorch as yours. |
I had the same problem with the new version of Pytorch. There are two solutions:
Modify lines 62 to 65 in baseline.py: `
` Modify lines 188 to 189 in baseline.py: `
` It is important to store the mapped label in self.trainset.targets, otherwise it will store the ID of the original Class, which would cause the targets scope to go out of bounds when calculating nn.CrossEntropyLoss in incremental.py, i.e., the above error. Finally, you need to modify the code that corresponds to the set of tests and calculations in baseline.py. (Similar modification to mnemonics.py) |
Hi,Professors.There some bugs when I run the code.Can you explain what they mean and how to solve the problems? Thank you very much!
Using gpu: 0
Files already downloaded and verified
Files already downloaded and verified
Order name:./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/seed_1993_cifar100_order_run_0.pkl
Generating orders
pickle into ./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/seed_1993_cifar100_order_run_0.pkl
[68, 56, 78, 8, 23, 84, 90, 65, 74, 76, 40, 89, 3, 92, 55, 9, 26, 80, 43, 38, 58, 70, 77, 1, 85, 19, 17, 50, 28, 53, 13, 81, 45, 82, 6, 59, 83, 16, 15, 44, 91, 41, 72, 60, 79, 52, 20, 10, 31, 54, 37, 95, 14, 71, 96, 98, 97, 2, 64, 66, 42, 22, 35, 86, 24, 34, 87, 21, 99, 0, 88, 27, 18, 94, 11, 12, 47, 25, 30, 46, 62, 69, 36, 61, 7, 63, 75, 5, 32, 4, 51, 48, 73, 93, 39, 67, 29, 49, 57, 33]
Out_features: 50
Batch of classes number 5 arrives
Max and min of train labels: 0, 49
Max and min of valid labels: 0, 49
Checkpoint name: ./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/run_0_iteration_4_model.pth
Incremental train
Epoch: 0, LR: [0.1]
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [7,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [9,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [12,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [13,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [17,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [19,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [21,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [22,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [23,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [26,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [28,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [29,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [30,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [31,0,0] Assertion
t >= 0 && t < n_classes
failed.Traceback (most recent call last):
File "main.py", line 78, in
trainer.train()
File "/data1/22160073/project/incremental learning/class-incremental-learning-main/mnemonics-training/1_train/trainer/mnemonics.py", line 237, in train
tg_model = incremental_train_and_eval(self.args.epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, cur_lamda, self.args.dist, self.args.K, self.args.lw_mr)
File "/data1/22160073/project/incremental learning/class-incremental-learning-main/mnemonics-training/1_train/trainer/incremental.py", line 44, in incremental_train_and_eval
loss.backward()
File "/data1/22160073/anaconda3/envs/xxz/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/data1/22160073/anaconda3/envs/xxz/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered
The text was updated successfully, but these errors were encountered: