some bugs #35

xiaozhenxu · 2022-01-10T10:44:27Z

Hi,Professors.There some bugs when I run the code.Can you explain what they mean and how to solve the problems? Thank you very much!
Using gpu: 0
Files already downloaded and verified
Files already downloaded and verified
Order name:./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/seed_1993_cifar100_order_run_0.pkl
Generating orders
pickle into ./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/seed_1993_cifar100_order_run_0.pkl
[68, 56, 78, 8, 23, 84, 90, 65, 74, 76, 40, 89, 3, 92, 55, 9, 26, 80, 43, 38, 58, 70, 77, 1, 85, 19, 17, 50, 28, 53, 13, 81, 45, 82, 6, 59, 83, 16, 15, 44, 91, 41, 72, 60, 79, 52, 20, 10, 31, 54, 37, 95, 14, 71, 96, 98, 97, 2, 64, 66, 42, 22, 35, 86, 24, 34, 87, 21, 99, 0, 88, 27, 18, 94, 11, 12, 47, 25, 30, 46, 62, 69, 36, 61, 7, 63, 75, 5, 32, 4, 51, 48, 73, 93, 39, 67, 29, 49, 57, 33]
Out_features: 50
Batch of classes number 5 arrives
Max and min of train labels: 0, 49
Max and min of valid labels: 0, 49
Checkpoint name: ./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/run_0_iteration_4_model.pth
Incremental train

Epoch: 0, LR: [0.1]
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File "main.py", line 78, in
trainer.train()
File "/data1/22160073/project/incremental learning/class-incremental-learning-main/mnemonics-training/1_train/trainer/mnemonics.py", line 237, in train
tg_model = incremental_train_and_eval(self.args.epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, cur_lamda, self.args.dist, self.args.K, self.args.lw_mr)
File "/data1/22160073/project/incremental learning/class-incremental-learning-main/mnemonics-training/1_train/trainer/incremental.py", line 44, in incremental_train_and_eval
loss.backward()
File "/data1/22160073/anaconda3/envs/xxz/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/data1/22160073/anaconda3/envs/xxz/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered

The text was updated successfully, but these errors were encountered:

yaoyao-liu · 2022-01-10T14:27:48Z

Thank you for your interest in our work.

Since this is an error caused by CUDA, I don't know how to fix it.
May I know if you are using the same version of python and pytorch as we are?

xiaozhenxu · 2022-01-10T15:12:54Z

Thanks for your reply.I will try to use the same version of python and pytorch as yours.

EnnengYang · 2023-03-01T15:14:19Z

I had the same problem with the new version of Pytorch. There are two solutions:

First, use the author's lower version of the pytorch environment (i.e. torch=0.4.0,python=3.6.8): same version of python and pytorch;
Second, modify the code to accommodate the higher version of the pytorch environment (torch=1.10.0+cu113 in my test environment).

Modify lines 62 to 65 in baseline.py:

`

  import torch
  torch_version = torch.__version__

  if '0.4.0' in str(torch_version):
        X_train_total = np.array(self.trainset.train_data)
        Y_train_total = np.array(self.trainset.train_labels)
        X_valid_total = np.array(self.testset.test_data)
        Y_valid_total = np.array(self.testset.test_labels)
  else:
        X_train_total = np.array(self.trainset.data)
        Y_train_total = np.array(self.trainset.targets)
        X_valid_total = np.array(self.testset.data)
        Y_valid_total = np.array(self.testset.targets)

`

Modify lines 188 to 189 in baseline.py:

`

 if '0.4.0' in str(torch_version):
            self.trainset.train_data = X_train.astype('uint8')
            self.trainset.train_labels = map_Y_train
else:
            self.trainset.data = X_train.astype('uint8')
            self.trainset.targets = map_Y_train

`

It is important to store the mapped label in self.trainset.targets, otherwise it will store the ID of the original Class, which would cause the targets scope to go out of bounds when calculating nn.CrossEntropyLoss in incremental.py, i.e., the above error.

Finally, you need to modify the code that corresponds to the set of tests and calculations in baseline.py. (Similar modification to mnemonics.py)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some bugs #35

some bugs #35

xiaozhenxu commented Jan 10, 2022

yaoyao-liu commented Jan 10, 2022

xiaozhenxu commented Jan 10, 2022

EnnengYang commented Mar 1, 2023 •

edited

Loading

some bugs #35

some bugs #35

Comments

xiaozhenxu commented Jan 10, 2022

yaoyao-liu commented Jan 10, 2022

xiaozhenxu commented Jan 10, 2022

EnnengYang commented Mar 1, 2023 • edited Loading

EnnengYang commented Mar 1, 2023 •

edited

Loading