You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great observations and work on disentangling the format following from reasoning! Could we share details on evaluation dataset we used and how we can reproduce the result in the paper? I have fine tuned llama3 on the dataset and achieved worse performance in 30 questions curated from HotpotQA dataset. If you could share some light on this it would be super appreciated! Thanks,
Jason
The text was updated successfully, but these errors were encountered:
Hey,
Great observations and work on disentangling the format following from reasoning! Could we share details on evaluation dataset we used and how we can reproduce the result in the paper? I have fine tuned llama3 on the dataset and achieved worse performance in 30 questions curated from HotpotQA dataset. If you could share some light on this it would be super appreciated! Thanks,
Jason
The text was updated successfully, but these errors were encountered: