-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for help while debugging #2
Comments
I'm sorry to give an incomplete training instruction. Please try: python train.py --name ASCON(experiment_name) --model ASCON --netG ESAU --dataroot /data/zhchen/Mayo2016_2d(path to images) --nce_layers 1,4 --layer_weight 1,1 --num_patches 32,512 --k_size 3,7 --lr 0.0002 --gpu_ids 6,7 --print_freq 25 --batch_size 8 --lr_policy cosine |
Thanks for ur reply. Now I met another problem, during training, I find self.fake_B and self.real_B seems always the same, I subtracted these two tensors element-wise and all the elements in the resulting tensor are the same, the RMSE (root mean squared error) is also zero. However, loss_D and loss_G decrease gradually. I wonder what is the problem. The training process is as follows: (epoch: 1, iters: 525,loss_D: 1.751378, loss_G: 0.174635,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 524/2167 [08:33<24:31, 1.12it/s] (epoch: 1, iters: 550,loss_D: 1.712759, loss_G: 0.171654,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 549/2167 [08:56<23:45, 1.13it/s] (epoch: 1, iters: 575,loss_D: 1.753926, loss_G: 0.174744,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 574/2167 [09:20<24:23, 1.09it/s] (epoch: 1, iters: 600,loss_D: 1.735380, loss_G: 0.173059,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 599/2167 [09:43<22:16, 1.17it/s] (epoch: 1, iters: 625,loss_D: 1.736408, loss_G: 0.170298,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 624/2167 [10:05<21:03, 1.22it/s] (epoch: 1, iters: 650,loss_D: 1.700382, loss_G: 0.181855,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 649/2167 [10:30<29:49, 1.18s/it] (epoch: 1, iters: 675,loss_D: 1.726071, loss_G: 0.174785,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 674/2167 [10:52<21:37, 1.15it/s] (epoch: 1, iters: 700,loss_D: 1.677032, loss_G: 0.169842,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 699/2167 [11:17<22:23, 1.09it/s] (epoch: 1, iters: 725,loss_D: 1.752134, loss_G: 0.172651,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 724/2167 [11:41<25:46, 1.07s/it] (epoch: 1, iters: 750,loss_D: 1.749419, loss_G: 0.169668,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 749/2167 [12:05<22:34, 1.05it/s] (epoch: 1, iters: 775,loss_D: 1.707352, loss_G: 0.167989,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) |
I do not use the same training dataset as the same of yours. The training datasets is normalized to 0-1. I have recovered the data while computing SSIM rmse |
Thanks for Ur novel work. I am trying to train the model on Moya dataset. I followed the command of traing prived by you. When the runing the python file of models.networks, I met a error warning as follows:
Traceback (most recent call last):
File "/dg_hpc/CNG/lijf/ASCON-main/train.py", line 72, in
model.optimize_parameters() # calculate loss functions, get gradients, update network weights
File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 128, in optimize_parameters
self.loss_D = self.compute_D_loss()
File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 160, in compute_D_loss
self.loss_D = self.MAC_Net(self.real_B, self.fake_B.detach())
File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 189, in MAC_Net
feat_k_pool_1, sample_ids, sample_local_ids, sample_top_idxs = self.netProjection_target(patch_size,feat_k_1, self.num_patches,None,None,None,pixweght=None)
File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/dg_hpc/CNG/lijf/ASCON-main/models/networks.py", line 130, in forward
N_patches=num_patches[feat_id]
IndexError: list index out of range
I have checked the value of num_patches. num_patches seems should be a array. However, I find num_patches is a predefined int value of 256. I wonder how to eliminate the error. Thanks!
The text was updated successfully, but these errors were encountered: