WIP: Support per-agent rewards in multi-agent setups by nph4rd · Pull Request #2575 · PrimeIntellect-ai/prime-rl

nph4rd · 2026-05-20T17:29:45Z

Adds support for per-agent rewards and advantages in multi-agent environments. This is a companion change to PrimeIntellect-ai/verifiers#965 which adds abstractions for multi-agent setups and heterogeneous reward functions.

…outs

… override

…gent_lora

nph4rd added 14 commits May 20, 2026 12:29

support per-agent rewards from multi-agent environments

f6483fa

point verifiers to multiagent-heterogeneous-rewards branch

a471966

log per-agent rewards to wandb for multi-agent environments

0a1bc6c

revert agent_rewards logging, now handled via metrics

4710803

compute per-agent grpo advantages for multi-agent environments

792a461

respect per-step is_trainable flag in interleave_rollout

bef068f

add multi-agent lora support for per-agent policy training

33b827b

split merged multi-agent samples by agent for per-agent lora training

e821d1b

fix multi-agent lora orch.toml and add pack_full_step

7177db3

enforce async level in multi-actor policy updates

b321108

auto-set max_concurrent_runs and fix packer timeout for multi-agent lora

e3a6cc0

use upstream dedup pattern for multi-actor policy updates

78ff0ac

honor per-step advantages in ZeroAdvantageFilter for multi-agent roll…

1ffd1ef

…outs

emit per-agent trainer metrics alongside per-env breakdowns

4937e10

nph4rd force-pushed the multiagent-heterogeneous-rewards branch from ef8659c to 4937e10 Compare May 20, 2026 20:16

nph4rd added 2 commits May 20, 2026 15:19

drop deps/verifiers from workspace members; conflicts with git source…

aa649af

… override

re-leaf orchestrator output_dir to outputs/orchestrator under multi_a…

42796a7

…gent_lora

nph4rd force-pushed the multiagent-heterogeneous-rewards branch from 1a981df to 42796a7 Compare May 20, 2026 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Support per-agent rewards in multi-agent setups#2575

WIP: Support per-agent rewards in multi-agent setups#2575
nph4rd wants to merge 16 commits into
PrimeIntellect-ai:mainfrom
nph4rd:multiagent-heterogeneous-rewards

nph4rd commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nph4rd commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant