-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRPO LoRA Single Device #2467
base: main
Are you sure you want to change the base?
GRPO LoRA Single Device #2467
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2467
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ecec2aa with merge base 1241231 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Updated this - I validated the recipe for single device, so figured it was just cleaner to do a single device diff. I reverted the checkpointing changes as the LoRA from SFT gets merged in and its cleaner just start it again. The recipe gets to a decent success rate during training - testing on 4 epochs running the gms8k task from lm evals gives:
This is (clearly) eval-on-the-training data, but for reference the base llama 3B gets 0.2646 on flexible (and for reference after SFT 0.29, one epoch of GRPO gets to .48). I was a little dubious whether LoRA would work here and it appears it does! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2467 +/- ##
==========================================
- Coverage 66.98% 66.42% -0.57%
==========================================
Files 378 373 -5
Lines 22527 21986 -541
==========================================
- Hits 15090 14604 -486
+ Misses 7437 7382 -55 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Context
What is the purpose of this PR? Is it to
#2421 - exploring a LoRA recipe.
Changelog
What are the changes made in this PR?
Following the pattern in the original GRPO PR I tried SFT then GRPO. The model reaches around 45% on the train set after 2 epochs.
By reducing the number of generations I got the recipe running in a single 24GB card, which is a good baseline for ease of access!
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
pre-commit install
)pytest tests
pytest tests -m integration_test
UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example