Skip to content

EnvGroup nested in outer EnvGroup silently misroutes all tasks to envs[0] #1008

@swpo

Description

@swpo

Bug: Nested EnvGroup task routing is silently broken

When an EnvGroup is wrapped inside another EnvGroup (as prime-rl does when it wraps user envs), the outer group overwrites the task column to a single name. The inner EnvGroup then fails to route because its env_map uses the original task names — and silently falls back to envs[0] for rollouts, or returns reward=0.0 for scoring.

This means only the first sub-environment's rubric ever executes, and the user gets no error or warning that the rest of their tasks are being silently misrouted or scored as zero.

Impact

If a user builds a multi-task environment using EnvGroup (e.g., 17 tasks with different rubrics) and registers it as a single [[orchestrator.env]] in prime-rl, only the first task type gets meaningful reward signal. The model trains with zero reward on all other task types. There is no error or warning — the only symptom is that wandb metrics only show the first rubric's reward name.

Problematic code path

1. Task name overwriting (EnvGroup.__init__, lines 172-182)

for env, name in zip(self.envs, self.env_names):
    add_task = make_add_task_fn(name)
    env_dataset = env.build_dataset()
    if env_dataset is not None:
        if "task" in env_dataset.column_names:
            env_dataset = env_dataset.remove_columns(["task"])  # destroys inner task names
        env_dataset = env_dataset.map(add_task, **map_kwargs)   # overwrites with outer name

When the sub-env is itself an EnvGroup, its dataset already has meaningful per-task names. The outer EnvGroup unconditionally destroys these and replaces them all with a single name.

2. Silent fallback in get_env_for_task (line 318-319)

def get_env_for_task(self, task: str) -> vf.Environment:
    return self.env_map.get(task, self.envs[0])  # silent fallback!

When the inner EnvGroup receives the outer name (which doesn't exist in its env_map), it silently falls back to envs[0]. All rollouts go to the first sub-environment. No warning is logged.

3. Scoring returns zeros for unknown tasks (EnvGroupRubric.score_group, lines 86-94)

task = states[0].get("task", "default")
env = self.env_map.get(task)
if env is None:
    self.logger.warning(f"No environment found for task '{task}'")
    for state in states:
        state["reward"] = 0.0  # all tasks scored as zero

The inner EnvGroupRubric can't find the outer name in its env_map, so all states get reward=0.0.

Proposed fix

A. Preserve inner task names when sub-env is an EnvGroup

In EnvGroup.__init__, when a sub-env is itself an EnvGroup, preserve its internal task names instead of overwriting them. Register each inner task name in the outer env_map pointing to the sub-env:

for env, name in zip(self.envs, self.env_names):
    add_task = make_add_task_fn(name)
    env_dataset = env.build_dataset()
    if env_dataset is not None:
        if isinstance(env, EnvGroup) and "task" in env_dataset.column_names:
            # Preserve inner EnvGroup's task names for correct routing
            for inner_name in env.env_names:
                self.env_map[inner_name] = env
            # Don't overwrite task column — inner tasks are already set
        else:
            if "task" in env_dataset.column_names:
                env_dataset = env_dataset.remove_columns(["task"])
            env_dataset = env_dataset.map(add_task, **map_kwargs)
        datasets.append(env_dataset)

Same pattern for eval_dataset handling.

This way:

  • Inner task names are preserved in the dataset
  • Outer env_map maps each inner task name to the inner EnvGroup
  • Routing works: outer gets a task name, routes to inner EnvGroup, inner routes to correct sub-env
  • EnvGroupRubric also gets the updated env_map, so scoring routes correctly
  • results_df.task.nunique() > 1, enabling per-task logging in orchestrators like prime-rl

B. Remove silent fallback in get_env_for_task

def get_env_for_task(self, task: str) -> vf.Environment:
    env = self.env_map.get(task)
    if env is None:
        available = list(self.env_map.keys())
        raise ValueError(
            f"No environment found for task '{task}'. "
            f"Available tasks: {available}"
        )
    return env

The current fallback to envs[0] silently masks routing failures. An explicit error makes misconfigurations immediately visible.

Test changes

1. Update test_get_env_for_task — unknown task should raise, not fallback

def test_get_env_for_task(self, mock_openai_client):
    # ... (same setup as current) ...
    env_group = EnvGroup(envs=[env1, env2], env_names=["math", "code"])

    assert env_group.get_env_for_task("math") == env1
    assert env_group.get_env_for_task("code") == env2
    # Unknown task should raise, not silently fallback
    with pytest.raises(ValueError, match="No environment found for task"):
        env_group.get_env_for_task("unknown")

2. Add test_nested_env_group_preserves_inner_tasks

def test_nested_env_group_preserves_inner_tasks(self, mock_openai_client):
    """Test that wrapping an EnvGroup in another EnvGroup preserves inner task names."""
    env1 = SingleTurnEnv(
        client=mock_openai_client,
        model="test-model",
        dataset=Dataset.from_dict({"question": ["q1"], "answer": ["a1"]}),
        rubric=Rubric(),
    )
    env2 = SingleTurnEnv(
        client=mock_openai_client,
        model="test-model",
        dataset=Dataset.from_dict({"question": ["q2"], "answer": ["a2"]}),
        rubric=Rubric(),
    )

    inner_group = EnvGroup(envs=[env1, env2], env_names=["math", "code"])
    outer_group = EnvGroup(envs=[inner_group], env_names=["my_env"])

    # Inner task names should be preserved in the dataset
    dataset = outer_group.get_dataset()
    tasks = dataset["task"]
    assert "math" in tasks
    assert "code" in tasks
    assert "my_env" not in tasks

    # Routing should work through both levels
    assert outer_group.get_env_for_task("math") == inner_group
    assert outer_group.get_env_for_task("code") == inner_group

3. Add test_nested_env_group_rubric_scoring

@pytest.mark.asyncio
async def test_nested_env_group_rubric_scoring(self, mock_openai_client, make_input):
    """Test that scoring routes correctly through nested EnvGroups."""
    def math_reward(completion, **kwargs):
        return 0.8

    def code_reward(completion, **kwargs):
        return 0.6

    env1 = SingleTurnEnv(
        client=mock_openai_client,
        model="test-model",
        dataset=Dataset.from_dict({"question": ["q1"], "answer": ["a1"]}),
        rubric=Rubric(funcs=[math_reward], weights=[1.0]),
    )
    env2 = SingleTurnEnv(
        client=mock_openai_client,
        model="test-model",
        dataset=Dataset.from_dict({"question": ["q2"], "answer": ["a2"]}),
        rubric=Rubric(funcs=[code_reward], weights=[1.0]),
    )

    inner_group = EnvGroup(envs=[env1, env2], env_names=["math", "code"])
    outer_group = EnvGroup(envs=[inner_group], env_names=["my_env"])

    # Score a "code" task — should route through outer -> inner -> env2's rubric
    state = State(input=make_input(prompt="Test", answer="ans", task="code"))
    state["completion"] = "Test completion"
    state["trajectory"] = []
    state["timing"] = {"generation_ms": 0.0, "scoring_ms": 0.0, "total_ms": 0.0, "start_time": 0.0}
    state["is_completed"] = False
    state["stop_condition"] = None
    state["oai_tools"] = []
    state["reward"] = None
    state["metrics"] = None

    await outer_group.rubric.score_rollout(state)

    assert state["reward"] == 0.6  # code_reward, not math_reward
    assert state["metrics"]["code_reward"] == 0.6
    assert state["metrics"]["math_reward"] == 0.0

Reproduction

Any environment that returns an EnvGroup from its load_environment() and is registered as a single [[orchestrator.env]] in prime-rl will hit this. The inner task names get overwritten, routing breaks silently, and only the first sub-env's rubric ever scores anything.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions