-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to reproduce the PendulumSwingup results #3
Comments
Thanks for your interest in our work! The exact numbers vary depending on the seeds tested, but CCIL should almost always outperform BC on the pendulum task. Please ensure you're running with the hyperparameters specified in the corresponding Running the following command on my machine: ./scripts/train_ccil.sh "pendulum_cont pendulum_disc" "40 41 42 43 44 45 46 47 48 49" 0.0001 yields the following results:
As you can see, the exact numbers change due to the associated variance, but CCIL still outperforms standard BC. |
Thanks for your prompt reply! I got it, but as I observed, especially in the discontinuous Pendulum, the performance between CCIL (-2912.906) and naive BC (-2978.408) is actually hard to distinguish on my machine, even with "40 41 42 43 44 45 46 47 48 49" 10 random seeds. Any good suggestions? |
Thanks for bringing it into our attention - it seems there are more variance than we initially realized on PendulumDiscontinuous. (We validated our config on 10 random seeds and 2 computing machines.) We might be able to try tweaking and updating the params, if we can reproduce the experiments that don't have the performance gap and then try sweep parameters from there. In the meantime, do you have a chance to verify the performance on the other task suite? Just want to double check if this is just a problem with stochasticity in PendulumDiscontinuous or there is more. |
Hi team,
Thanks for sharing the great work! I have tried reproducing the PendulumSwingup experiments, both continuous and discontinuous. I just used the scripts and codes you gave, without any modification. But I find the results do not match the performance shown in Figure 3. (c) in paper CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning. Do any hyperparameters need to tune, or anything else I need to change to get that results?
Thanks a lot!
The text was updated successfully, but these errors were encountered: