Skip to content

add partner seconds_stopped observation feature#471

Merged
eugenevinitsky merged 14 commits into
emerge/temp_trainingfrom
ev/stopped_feature
Jun 3, 2026
Merged

add partner seconds_stopped observation feature#471
eugenevinitsky merged 14 commits into
emerge/temp_trainingfrom
ev/stopped_feature

Conversation

@eugenevinitsky
Copy link
Copy Markdown

@eugenevinitsky eugenevinitsky commented Jun 2, 2026

WIP. Adds a per-partner "how long has this agent been stopped" signal to the
partner observation block.

…FEATURES 8->9)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 2, 2026 20:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new per-partner observation feature to expose “how long this partner agent has been stopped” (normalized and capped), aligning partner observations with the existing ego seconds_stopped signal.

Changes:

  • Bump PARTNER_FEATURES from 8 → 9 to reflect the expanded partner observation vector.
  • Extend write_partner_obs to append fminf(1.0f, other->seconds_stopped / MAX_STOPPED_SECONDS) as the 9th partner feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The interactive obs viewer hardcoded the partner block stride as 8, so with
PARTNER_FEATURES=9 it mis-parsed partners and shifted every subsequent obs
block (lanes/boundaries/traffic). Add partner_features to the replay header and
use H.partner_features (mirroring how target_features is already handled).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@vcharraut vcharraut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the notebooks with the new partners features

eugenevinitsky and others added 6 commits June 3, 2026 07:08
The smoke golden is only bit-reproducible inside the QEMU/Haswell smoke
image, so it cannot be regenerated on an arbitrary dev box. This adds a
push-triggered (marker-gated) CI job that builds the image, runs the
train smoke test with SMOKE_UPDATE_GOLDEN=1, uploads the result as an
artifact, and commits it back to the branch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Regenerate smoke golden in the pinned QEMU image (via CI workflow); it
  now reflects the 9th partner feature plus the obs/reward_components
  metrics the current pipeline logs.
- Mark partner seconds_stopped obs as a temporary hack in drive.h.
- 05_inference.ipynb: add seconds_stopped to partner_labels, drive the
  per-feature loop off len(partner_labels) instead of a literal 8, and
  fix stale shape comments + the markdown obs spec. Also corrects a
  pre-existing length/width label swap to match the C write order.
- Workflow: git add -f the golden (tests/smoke_tests/data is gitignored).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The partner heatmap sets xticks to range(env.partner_features) (now 9) but
xticklabels to partner_labels; without the 9th label the ticks/labels
mismatch. Append seconds_stopped to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Removed comments regarding the workflow trigger and its usage.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@eugenevinitsky eugenevinitsky changed the title WIP: add partner seconds_stopped observation feature add partner seconds_stopped observation feature Jun 3, 2026
eugenevinitsky and others added 6 commits June 3, 2026 08:02
seconds_stopped was incremented inside compute_rewards, which only runs
for policy-controlled (active) agents. In control_sdc_only mode the ego
is the sole active agent, so every other car's seconds_stopped stayed
pinned at 0 and the partner observation was dead in that mode.

Move the update into a dedicated loop over all agents (active + static/
replayed) right after the move step, mirroring the active-then-static
index resolution used when gathering partner observations. Verified in
control_sdc_only replay: partner seconds_stopped now populates (was 0).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The regen workflow's json.dump writes no trailing newline, so the
bot-committed golden trips pre-commit's end-of-file-fixer. Add the
newline to the committed golden and make the workflow append one itself.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Refactor agent index resolution for stopped-duration updates.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@eugenevinitsky eugenevinitsky merged commit 70e6e8b into emerge/temp_training Jun 3, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants