You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to to use pre-trained model for the embedding but there are two bugs.
The first one in the sample.py, with init the parameters. I think we shouldn't init the embedding. for param in seq2seq.parameters(): param.data.uniform_(-0.08, 0.08)
I recommend to be like that for param in [p for p in seq2seq.parameters() if p.requires_grad == True]: param.data.uniform_(-0.08, 0.08)
The second bug is with the optimization in supervised_trainer.py, that the optimization will throw an error when we will try to optimize the embedding. optimizer = Optimizer(optim.Adam(model.parameters()), max_grad_norm=5)
I will recommend to be like that. optimizer = Optimizer(optim.Adam(filter(lambda p: p.requires_grad, model.parameters())), max_grad_norm=5)
I have a question, do you thing it is important to have the pre-trained embedding in the decoder either or just in the encoder will be enough ?
The text was updated successfully, but these errors were encountered:
The same error with the resume option. should be self.optimizer.optimizer = resume_optim.__class__(filter(lambda p: p.requires_grad, model.parameters()), **defaults)
Hello,
I was trying to to use pre-trained model for the embedding but there are two bugs.
The first one in the sample.py, with init the parameters. I think we shouldn't init the embedding.
for param in seq2seq.parameters(): param.data.uniform_(-0.08, 0.08)
I recommend to be like that
for param in [p for p in seq2seq.parameters() if p.requires_grad == True]: param.data.uniform_(-0.08, 0.08)
The second bug is with the optimization in supervised_trainer.py, that the optimization will throw an error when we will try to optimize the embedding.
optimizer = Optimizer(optim.Adam(model.parameters()), max_grad_norm=5)
I will recommend to be like that.
optimizer = Optimizer(optim.Adam(filter(lambda p: p.requires_grad, model.parameters())), max_grad_norm=5)
I have a question, do you thing it is important to have the pre-trained embedding in the decoder either or just in the encoder will be enough ?
The text was updated successfully, but these errors were encountered: