GRPO LoRA Single Device #2467

ianbarber · 2025-03-09T00:27:19Z

Context

What is the purpose of this PR? Is it to

[x ] add a new feature
fix a bug
update tests and/or documentation
other (please add here)

#2421 - exploring a LoRA recipe.

Changelog

What are the changes made in this PR?

Add in new configs and recipe for lora single device
SFT config for lora

Following the pattern in the original GRPO PR I tried SFT then GRPO. The model reaches around 45% on the train set after 2 epochs.

By reducing the number of generations I got the recipe running in a single 24GB card, which is a good baseline for ease of access!

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

[x ] run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
[x ] manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

[x ] I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2025-03-09T00:27:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2467

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ecec2aa with merge base 1241231 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ianbarber · 2025-03-22T03:24:03Z

Updated this - I validated the recipe for single device, so figured it was just cleaner to do a single device diff. I reverted the checkpointing changes as the LoRA from SFT gets merged in and its cleaner just start it again. The recipe gets to a decent success rate during training - testing on 4 epochs running the gms8k task from lm evals gives:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5436|±  |0.0137|
|     |       |strict-match    |     5|exact_match|↑  |0.5421|±  |0.0137|

This is (clearly) eval-on-the-training data, but for reference the base llama 3B gets 0.2646 on flexible (and for reference after SFT 0.29, one epoch of GRPO gets to .48). I was a little dubious whether LoRA would work here and it appears it does!

codecov-commenter · 2025-03-22T04:58:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 66.42%. Comparing base (1241231) to head (ecec2aa).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2467      +/-   ##
==========================================
- Coverage   66.98%   66.42%   -0.57%     
==========================================
  Files         378      373       -5     
  Lines       22527    21986     -541     
==========================================
- Hits        15090    14604     -486     
+ Misses       7437     7382      -55

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 9, 2025

ianbarber added 4 commits March 18, 2025 17:12

Add initial GRPO LoRA

587245b

Add working single device lora recipe

11de28d

Remove distributed LORA recipe for now

ef403c7

Remove distributed lora grpo from registry

9d87a3c

ianbarber force-pushed the grpolora branch from 5b1a15c to 9d87a3c Compare March 19, 2025 00:13

Revert checkpointer changes

cd5e403

ianbarber changed the title ~~(draft/discussion) GRPO LoRA~~ GRPO LoRA Single Device Mar 19, 2025

Fix lints

d3ecef8

Fix recipe registry

ecec2aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO LoRA Single Device #2467

GRPO LoRA Single Device #2467

ianbarber commented Mar 9, 2025 •

edited

Loading

pytorch-bot bot commented Mar 9, 2025 •

edited

Loading

ianbarber commented Mar 22, 2025 •

edited

Loading

codecov-commenter commented Mar 22, 2025 •

edited

Loading

GRPO LoRA Single Device #2467

Are you sure you want to change the base?

GRPO LoRA Single Device #2467

Conversation

ianbarber commented Mar 9, 2025 • edited Loading

Context

Changelog

Test plan

UX

pytorch-bot bot commented Mar 9, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2467

✅ No Failures

ianbarber commented Mar 22, 2025 • edited Loading

codecov-commenter commented Mar 22, 2025 • edited Loading

Codecov Report

ianbarber commented Mar 9, 2025 •

edited

Loading

pytorch-bot bot commented Mar 9, 2025 •

edited

Loading

ianbarber commented Mar 22, 2025 •

edited

Loading

codecov-commenter commented Mar 22, 2025 •

edited

Loading