Unable to reproduce the PendulumSwingup results #3

dadadadawjb · 2024-06-29T03:18:33Z

Hi team,

Thanks for sharing the great work! I have tried reproducing the PendulumSwingup experiments, both continuous and discontinuous. I just used the scripts and codes you gave, without any modification. But I find the results do not match the performance shown in Figure 3. (c) in paper CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning. Do any hyperparameters need to tune, or anything else I need to change to get that results?

Thanks a lot!

abhaybd · 2024-07-02T08:54:20Z

Thanks for your interest in our work! The exact numbers vary depending on the seeds tested, but CCIL should almost always outperform BC on the pendulum task. Please ensure you're running with the hyperparameters specified in the corresponding .yml files.

Running the following command on my machine:

./scripts/train_ccil.sh "pendulum_cont pendulum_disc" "40 41 42 43 44 45 46 47 48 49" 0.0001

yields the following results:

+-------------------------------------------------------+-----------+---------------+-----------+
| Task                                                  |     Score |   Score (std) |   # seeds |
+=======================================================+===========+===============+===========+
| PendulumSwingupCont-v0_naive                          | -3335.941 |       132.394 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+
| PendulumSwingupCont-v0_noisy_action_soft_samplingL2.0 | -2527.913 |       363.101 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+
| PendulumSwingupDisc-v0_noisy_action_slackL2.0         | -2794.500 |       400.131 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+
| PendulumSwingupDisc-v0_naive                          | -3001.869 |       231.012 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+

As you can see, the exact numbers change due to the associated variance, but CCIL still outperforms standard BC.

dadadadawjb · 2024-07-03T08:34:21Z

Thanks for your prompt reply! I got it, but as I observed, especially in the discontinuous Pendulum, the performance between CCIL (-2912.906) and naive BC (-2978.408) is actually hard to distinguish on my machine, even with "40 41 42 43 44 45 46 47 48 49" 10 random seeds. Any good suggestions?

Kelym · 2024-08-06T19:55:41Z

Thanks for bringing it into our attention - it seems there are more variance than we initially realized on PendulumDiscontinuous. (We validated our config on 10 random seeds and 2 computing machines.) We might be able to try tweaking and updating the params, if we can reproduce the experiments that don't have the performance gap and then try sweep parameters from there.

In the meantime, do you have a chance to verify the performance on the other task suite? Just want to double check if this is just a problem with stochasticity in PendulumDiscontinuous or there is more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce the PendulumSwingup results #3

Unable to reproduce the PendulumSwingup results #3

dadadadawjb commented Jun 29, 2024

abhaybd commented Jul 2, 2024

dadadadawjb commented Jul 3, 2024

Kelym commented Aug 6, 2024 •

edited

Loading

Unable to reproduce the PendulumSwingup results #3

Unable to reproduce the PendulumSwingup results #3

Comments

dadadadawjb commented Jun 29, 2024

abhaybd commented Jul 2, 2024

dadadadawjb commented Jul 3, 2024

Kelym commented Aug 6, 2024 • edited Loading

Kelym commented Aug 6, 2024 •

edited

Loading