You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I see that this implementation is lacking masked attention on encoder. Input_lengths should be passed to decoder (not just encoder) in order to compute this. OpenNMT already provided this in function sequence_mask.
Best,
The text was updated successfully, but these errors were encountered:
I just noticed the same thing and landed here. The attention mechanism should only include those encoder outputs in the weighted sum that correspond to valid tokens in the input sequences. For example, if your input lengths in the batch are 23, 12, 7. Then for the third element in the batch, the attention should compute the weighted sum over the 7 encoder outputs, rather than all 23.
Normally your attention would learn to ignore the extra encoder outputs anyway, but this might pose a problem if you train and test with different maximum sentence sizes.
@valtsblukis thx for explaining it. Yes that was my understading too, but I'd also assume the model would learn it anyways. I am also performing an experiment with my model with/without masking too see the difference.
Hi,
I see that this implementation is lacking masked attention on encoder. Input_lengths should be passed to decoder (not just encoder) in order to compute this. OpenNMT already provided this in function sequence_mask.
Best,
The text was updated successfully, but these errors were encountered: