Improved reward conditioning and waypoint following support by daphne-cornelisse · Pull Request #391 · Emerge-Lab/gpudrive

daphne-cornelisse · 2025-03-29T15:12:03Z

Acknowledging that this should have been multiple PRs, but leaving as is since I'm just merging to dev for now.

@eugenevinitsky I know this is impossible to review, but maybe you can briefly skim the PR? I just self-reviewed it too

…/reward_conditioning

…b/gpudrive into dc/reward_conditioning

* Improve formatting tutorial 8 * Visualize rollouts with different rewards. * Make human-replay agents slightly darker * Add option for storing behavioral metrics * WIP * Analyze agent diversity * Analyze agent diversity v2 * Full diversity analysis * Merge in main * wip * Sync * Implement waypoint following agent * WIP * Set defaults * fix network * Small setting updates * Improve and extend options for waypoint following rewards * Eval new model * Formatting * Set default agent_type for fixed condition mode * minor * Apply reward weight sharing across environments for memory efficiency * Add condition mode to wrapper * Reduce max road points for sim speed up * Add agent with separate actor and critic network * Bug fix: checkpointing * wip * Set training defaults to best params * Add separate waypoint following agent * Merge dev into branch * Fix waypoint following implementation * Sbatch * Can successful learn waypoint following agent * Add goal state to ego state by default, so that agents know when the goal is reached. * Increase log window size * Remove the goal reward when following waypoints * Set roadpoints to default to avoid switching * Implement reference path in reward and observation * Bug fixes * Working kinematic metrics * Minor * Make logging realism metrics optional * WIP * Add simple agent * Update settings * Add condition in dones such tthat agents are not allowed to terminate before the log end * Update realism metrics and support for adding the reference speed * Update realism metrics and support for adding the reference speed * Settings * Add average displacement error * Set reward for reaching the goal * Minor * New defaults * Bug fix: Zero-out the waypoint distance computations for time steps where the reference logs are invalid. * More stable realism metrics by averaging over larger batches * Controll all agent types by default * Add option for jerk penalties * Add option for jerk penalties * Change: Agents cannot be terminated before end of episode length * Remove distance to last expert position from ego state * Update number of ego state constants accordingly * Update visualizer to match new conditions * Batch global -> local reference frame transformation * typo * Minor logging fix * New defaults * Replace jerk with single param * Condition on previous action if present * Name change * Faster resets * Cleanup * Integrate fb * Fix all reference-path-related bugs * Formatting * Useful debug notebook * Better default * Decrease steering angle ub from pi to pi/3 * Add agent obs to logging * Linting * Set group * Fix config

* inital commit for wosac eval * Force data processing to be on CPU * Add readme * fix the data extraction script * Force jax run on cpu only * Fixes and add data processing file * add wosac original eval script * agent init fix? * Feat/vbd amortize (#409) * raw untested changes * cleanup artifact * tensor conversion fix * working amortization script * amortization fixes * replace examples with vbd * fixes * revert to 32 agents * variable agent count * amortized womd agents * Improved reward conditioning and waypoint following support (#391) * Improve formatting tutorial 8 * Visualize rollouts with different rewards. * Make human-replay agents slightly darker * Add option for storing behavioral metrics * WIP * Analyze agent diversity * Analyze agent diversity v2 * Full diversity analysis * Merge in main * wip * Sync * Implement waypoint following agent * WIP * Set defaults * fix network * Small setting updates * Improve and extend options for waypoint following rewards * Eval new model * Formatting * Set default agent_type for fixed condition mode * minor * Apply reward weight sharing across environments for memory efficiency * Add condition mode to wrapper * Reduce max road points for sim speed up * Add agent with separate actor and critic network * Bug fix: checkpointing * wip * Set training defaults to best params * Add separate waypoint following agent * Merge dev into branch * Fix waypoint following implementation * Sbatch * Can successful learn waypoint following agent * Add goal state to ego state by default, so that agents know when the goal is reached. * Increase log window size * Remove the goal reward when following waypoints * Set roadpoints to default to avoid switching * Implement reference path in reward and observation * Bug fixes * Working kinematic metrics * Minor * Make logging realism metrics optional * WIP * Add simple agent * Update settings * Add condition in dones such tthat agents are not allowed to terminate before the log end * Update realism metrics and support for adding the reference speed * Update realism metrics and support for adding the reference speed * Settings * Add average displacement error * Set reward for reaching the goal * Minor * New defaults * Bug fix: Zero-out the waypoint distance computations for time steps where the reference logs are invalid. * More stable realism metrics by averaging over larger batches * Controll all agent types by default * Add option for jerk penalties * Add option for jerk penalties * Change: Agents cannot be terminated before end of episode length * Remove distance to last expert position from ego state * Update number of ego state constants accordingly * Update visualizer to match new conditions * Batch global -> local reference frame transformation * typo * Minor logging fix * New defaults * Replace jerk with single param * Condition on previous action if present * Name change * Faster resets * Cleanup * Integrate fb * Fix all reference-path-related bugs * Formatting * Useful debug notebook * Better default * Decrease steering angle ub from pi to pi/3 * Add agent obs to logging * Linting * Set group * Fix config * partial fix * inverse action fix * warmup impl * womd init fix * cleanup --------- Co-authored-by: Zixu Zhang <zixuz@princeton.edu> Co-authored-by: Daphne Cornelisse <cor.daphne@gmail.com> Co-authored-by: Zixu Zhang <zixu@umich.edu> Co-authored-by: Daphne Cornelisse <33460159+daphne-cornelisse@users.noreply.github.com>

daphne-cornelisse added 10 commits March 20, 2025 10:01

Improve formatting tutorial 8

c07d855

Visualize rollouts with different rewards.

f1dedc0

Make human-replay agents slightly darker

e993164

Add option for storing behavioral metrics

1806498

WIP

0c00526

Analyze agent diversity

2a08793

Analyze agent diversity v2

a3d0543

Full diversity analysis

d1fab12

Merge in main

3871209

Merge in main

6dcd690

daphne-cornelisse changed the base branch from main to dev March 31, 2025 15:38

daphne-cornelisse added 19 commits April 1, 2025 11:45

wip

d211756

Sync

feacdf7

Merge branch 'main' of https://github.com/Emerge-Lab/gpudrive into dc…

3e6d0f2

…/reward_conditioning

Merge branch 'dc/reward_conditioning' of https://github.com/Emerge-La…

8dae37f

…b/gpudrive into dc/reward_conditioning

Implement waypoint following agent

122e0f8

WIP

e8c1bd2

Merge dev into dc/reward_conditioning

8c62a2b

Set defaults

24ce94e

fix network

1175706

Small setting updates

20500ef

Improve and extend options for waypoint following rewards

41235ca

Eval new model

f283975

Formatting

4939079

Set default agent_type for fixed condition mode

f0d66d5

minor

d63f370

Apply reward weight sharing across environments for memory efficiency

d59826a

Add condition mode to wrapper

3803945

Reduce max road points for sim speed up

a95e41f

Add agent with separate actor and critic network

b1a2c53

daphne-cornelisse added 6 commits April 19, 2025 11:10

Minor logging fix

1f0c2b9

New defaults

7266b5d

Replace jerk with single param

325432d

Condition on previous action if present

012a547

Name change

8bac16f

Faster resets

670d221

daphne-cornelisse changed the title ~~Reward conditioning experiments~~ Improved reward conditioning and waypoint following support Apr 21, 2025

daphne-cornelisse marked this pull request as ready for review April 21, 2025 15:30

Cleanup

da8a001

daphne-cornelisse requested a review from eugenevinitsky April 21, 2025 15:51