Skip to content

Design: make generation targets explicit for single- and multi-agent rollout #2538

@joanvelja

Description

@joanvelja

Problem

Current Prime-RL recipes are single-agent by convention: one rollout client/model/sampling triple is threaded through the orchestrator, and multi-agent support has to treat that as the implicit learner target while adding per-member fixed targets beside it. That works, but the model is asymmetric: single-agent recipes are not represented as the same generation-plan object used by multi-agent recipes.

This is acceptable for a first multi-agent cut, but it is a smell before upstreaming because it makes the default path invisible and forces bridge code to know which fields are legacy fallbacks.

Proposed transition

  1. Introduce a runtime-internal GenerationPlan as the canonical compiled object for all rollouts.
  2. Compile existing single-agent configs into a one-target plan at the ingestion boundary. Do not require old TOMLs to change in the first pass.
  3. Compile multi-agent configs into the same plan shape, keyed by Verifiers member_id.
  4. Keep user-facing config minimal: existing single-agent TOMLs stay as-is; multi-agent TOMLs only add role policy (train_one) and named fixed targets.
  5. Once this is stable, move code that reads legacy client/model/sampling fields behind the compiler and make scheduler/env code consume only the compiled plan.

Non-goals

  • Do not add endpoint routing to Verifiers. Verifiers should carry member_id; Prime-RL owns deployment topology.
  • Do not require all existing recipes to grow multi-agent-style config blocks.
  • Do not introduce a separate transport abstraction when the existing client config plus model/sampling target is sufficient.

Acceptance criteria

  • Existing recipes run unchanged.
  • Multi-agent train-one/fixed-opponent/fixed-judge recipes compile to the same internal plan type.
  • Scheduler and env-server tests cover the compiled plan path end to end.
  • Runtime modules depend on cohesive config modules, not the monolithic orchestrator schema.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions