Hi, Thanks for your great work.
I would like to run vanilla DPO (offline DPO) as the baseline to compare its performance with online DPO. May I ask whether I can use this codebase to tun the experiment and what is the running command. Thank you very much in advance.
Hi, Thanks for your great work.
I would like to run vanilla DPO (offline DPO) as the baseline to compare its performance with online DPO. May I ask whether I can use this codebase to tun the experiment and what is the running command. Thank you very much in advance.