Design: make generation targets explicit for single- and multi-agent rollout

## Problem

Current Prime-RL recipes are single-agent by convention: one rollout client/model/sampling triple is threaded through the orchestrator, and multi-agent support has to treat that as the implicit learner target while adding per-member fixed targets beside it. That works, but the model is asymmetric: single-agent recipes are not represented as the same generation-plan object used by multi-agent recipes.

This is acceptable for a first multi-agent cut, but it is a smell before upstreaming because it makes the default path invisible and forces bridge code to know which fields are legacy fallbacks.

## Proposed transition

1. Introduce a runtime-internal `GenerationPlan` as the canonical compiled object for all rollouts.
2. Compile existing single-agent configs into a one-target plan at the ingestion boundary. Do not require old TOMLs to change in the first pass.
3. Compile multi-agent configs into the same plan shape, keyed by Verifiers `member_id`.
4. Keep user-facing config minimal: existing single-agent TOMLs stay as-is; multi-agent TOMLs only add role policy (`train_one`) and named fixed targets.
5. Once this is stable, move code that reads legacy client/model/sampling fields behind the compiler and make scheduler/env code consume only the compiled plan.

## Non-goals

- Do not add endpoint routing to Verifiers. Verifiers should carry `member_id`; Prime-RL owns deployment topology.
- Do not require all existing recipes to grow multi-agent-style config blocks.
- Do not introduce a separate transport abstraction when the existing client config plus model/sampling target is sufficient.

## Acceptance criteria

- Existing recipes run unchanged.
- Multi-agent train-one/fixed-opponent/fixed-judge recipes compile to the same internal plan type.
- Scheduler and env-server tests cover the compiled plan path end to end.
- Runtime modules depend on cohesive config modules, not the monolithic orchestrator schema.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: make generation targets explicit for single- and multi-agent rollout #2538

Problem

Proposed transition

Non-goals

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Design: make generation targets explicit for single- and multi-agent rollout #2538

Description

Problem

Proposed transition

Non-goals

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions