Skip to content

log all trajectory repetitions per training step (not just first)#1388

Open
arteemg wants to merge 2 commits intoNovaSky-AI:mainfrom
arteemg:main
Open

log all trajectory repetitions per training step (not just first)#1388
arteemg wants to merge 2 commits intoNovaSky-AI:mainfrom
arteemg:main

Conversation

@arteemg
Copy link
Copy Markdown

@arteemg arteemg commented Mar 25, 2026

Problem

During training, log_example was called only on response_ids[0] and rewards[0], meaning only the first of n_samples_per_prompt trajectories was ever logged per training step. With n_samples_per_prompt=8, 7 out of 8 rollouts were invisible, making it impossible to debug agent behavior across repetitions.

Change

Loop over all trajectories in the batch and log each one, labeled with its instance_id and repetition_id from TrajectoryID (when available), or a fallback index.

Example

Run after the fix, with all rollouts for a single contest (4316) being correctly logged:

code_contests_4316_all_reps.log

## Problem

During training, `log_example` was called only on `response_ids[0]` and `rewards[0]`,
meaning only the first of `n_samples_per_prompt` trajectories was ever logged per
training step. With `n_samples_per_prompt=8`, 7 out of 8 rollouts were invisible,
making it impossible to debug agent behavior across repetitions.

## Change

Loop over all trajectories in the batch and log each one, labeled with its
`instance_id` and `repetition_id` from `TrajectoryID` (when available), or a fallback index.
gemini-code-assist[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@arteemg
Copy link
Copy Markdown
Author

arteemg commented Mar 26, 2026

@SumanthRH could you take a look! thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant