Skip to content

feat: support top-k training #2600

@mikasenghaas

Description

@mikasenghaas

Support top-k training: instead of training on all rollouts in a group, only train on the top-k rollouts ranked by reward.

This is a common technique in GRPO-style training to focus gradient signal on the highest-quality completions within each group, improving sample efficiency and training stability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions