huggingface / trl Public

generated from fastai/nbdev_template

Notifications You must be signed in to change notification settings
Fork 2.1k
Star 15k

Code
Issues 453
Pull requests 86
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: huggingface/trl

Labels 32 Milestones 0

New pull request New

86 Open 1,783 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Fix reward_processing_classes validation in GRPOTrainer (#2839)

#3876 opened Aug 9, 2025 by chi2liu

Loading…

4 tasks done

Optimize truncate_with_protected_tokens to use vectorized operations

#3875 opened Aug 9, 2025 by chi2liu

Loading…

2 of 5 tasks

Optimize completion_ids list conversion in GRPO trainer

#3874 opened Aug 9, 2025 by chi2liu

Loading…

2 of 5 tasks

Improve quickstart documentation with updated API examples

#3873 opened Aug 9, 2025 by behroozazarkhalili

Loading…

vLLM rollout numerical differences causing off-policy RL.

#3867 opened Aug 7, 2025 by LeonEricsson • Draft

5 tasks

Implement DPOP

#3864 opened Aug 7, 2025 by 1485840691 • Draft

Replaced unittest.TestCase with TrlTestCase that handles tmp dir

#3863 opened Aug 7, 2025 by qgallouedec

Loading…

🔮 Native VLM support for SFTTrainer

#3862 opened Aug 7, 2025 by qgallouedec

Loading…

[GRPO]: Fix BOS duplication bug when using VLLM

#3856 opened Aug 6, 2025 by pramodith • Draft

3 of 5 tasks

[Callbacks] BEMA

#3855 opened Aug 6, 2025 by kashif

Loading…

[#3647] Fix: Assign default values in the GKDTrainer's constructor only when …

#3851 opened Aug 5, 2025 by seungduk-yanolja

Loading…

2 of 5 tasks

Update profiling.py: fix scoping problems for wandb and mlflow

#3845 opened Aug 4, 2025 by markshinyounglee

Loading…

5 tasks done

dynamic temperature

#3844 opened Aug 4, 2025 by shirinyamani • Draft

5 tasks

Add py.typed

#3841 opened Aug 4, 2025 by cyyever

Loading…

Optimize RLOO Trainer memory usage with string-level processing

#3837 opened Aug 2, 2025 by luckyvickyricky

Loading…

2 of 5 tasks

[GSPO]: Refactor _compute_loss

#3835 opened Aug 1, 2025 by pramodith

Loading…

2 of 5 tasks

[CPO] Add AlphaPO method via CPOTrainer

#3824 opened Jul 31, 2025 by kashif

Loading…

Fix SFTTrainer token accuracy computation with PromptEncoder

#3821 opened Jul 31, 2025 by zk-quantum

Loading…

5 tasks done

support GSPO-token

#3820 opened Jul 31, 2025 by hjh0119

Loading…

GSPO docs - Sequence importance ratio and differences in relation to GRPO

#3816 opened Jul 31, 2025 by almeidava93

Loading…

2 of 5 tasks

Adding support for different losses which are now supported by Liger

#3815 opened Jul 31, 2025 by Manan17

Loading…

1 of 5 tasks

💇 Add soft overlong punishment reward function and update documentation

#3804 opened Jul 30, 2025 by qgallouedec

Loading…

Rloo final

#3801 opened Jul 29, 2025 by shirinyamani

Loading…

5 tasks

Add dataset mixer

#3791 opened Jul 28, 2025 by lewtun

Loading…

1 of 5 tasks

[GRPO] update transformer version for CB

#3786 opened Jul 28, 2025 by kashif

Loading…

Previous 1 2 3 4 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!