-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled Sampling #152
Comments
You might be right, but the teacher forcing here could really improve the performance of 1 ~ 2 point~ |
@AtmaHou Did you mean the kind of teacher forcing that is implemented here? I tried that and it actually doesn't improve the perf (in agreement with scheduled sampling paper) |
Yep~~ You could have a try to tune the teacher forcing rate (default 0), 0.5 is worth trying. I found both 0 and 1 are not helping. |
@AtmaHou My experience has not been good with this kind of teacher forcing for non-trivial tasks so far. It worsens my result sometimes. Scheduled sampling method works better though. Since this repo has so many stars and at one point I was using as a ref implementation, I thought I should point it out. |
@umgupta Ha~Your post has also deepened my understanding of teacher forcing. |
@AtmaHou Sure do so and let me know :). Also, I am fairly new to learning sequences. Do you happen to know some toy problem to compare/judge the sanity of the algorithm (like mnist for images)? The one in this repo to learn to reverse is too trivial. (Too trivial because any kind of teacher forcing works ok, or even if you make some mistake in code it worked with good result) |
@umgupta Machine translation problem in pytorch tutorial is quite simple, which might satisfy you. |
In the scheduled sampling paper, it is mentioned that if we try to train by tossing coin and deciding whether to provide predicted output for the whole sequence or not it performs worse. Instead one should choose to provide correct token or not at each time step. (see p3. footnote in the paper). Yet in the decoder, teacher forcing is either enabled for the whole sequence or not, I don't think that would work.
The text was updated successfully, but these errors were encountered: