Skip to content

Latest commit

 

History

History
18 lines (14 loc) · 341 Bytes

api_ref_rlhf.rst

File metadata and controls

18 lines (14 loc) · 341 Bytes

torchtune.rlhf

.. currentmodule:: torchtune.rlhf

Components and losses for RLHF algorithms like PPO and DPO.

.. autosummary::
   :toctree: generated/
   :nosignatures:

    estimate_advantages
    get_rewards_ppo
    truncate_sequence_at_first_stop_token
    loss.PPOLoss
    loss.DPOLoss
    loss.RSOLoss