Skip to content

Improved reward conditioning and waypoint following support#391

Merged
daphne-cornelisse merged 89 commits into
devfrom
dc/reward_conditioning
Apr 23, 2025
Merged

Improved reward conditioning and waypoint following support#391
daphne-cornelisse merged 89 commits into
devfrom
dc/reward_conditioning

Conversation

@daphne-cornelisse
Copy link
Copy Markdown
Contributor

@daphne-cornelisse daphne-cornelisse commented Mar 29, 2025

Acknowledging that this should have been multiple PRs, but leaving as is since I'm just merging to dev for now.

@eugenevinitsky I know this is impossible to review, but maybe you can briefly skim the PR? I just self-reviewed it too

@daphne-cornelisse daphne-cornelisse changed the base branch from main to dev March 31, 2025 15:38
@daphne-cornelisse daphne-cornelisse changed the title Reward conditioning experiments Improved reward conditioning and waypoint following support Apr 21, 2025
@daphne-cornelisse daphne-cornelisse marked this pull request as ready for review April 21, 2025 15:30
Comment thread baselines/ppo/config/ppo_base_puffer.yaml
Comment thread baselines/ppo/config/ppo_waypoint.yaml
Comment thread gpudrive/datatypes/trajectory.py
Comment thread gpudrive/datatypes/trajectory.py
Comment thread gpudrive/datatypes/trajectory.py
@daphne-cornelisse daphne-cornelisse merged commit b9c5496 into dev Apr 23, 2025
1 check failed
@daphne-cornelisse daphne-cornelisse deleted the dc/reward_conditioning branch April 23, 2025 14:11
nadarenator pushed a commit that referenced this pull request Apr 24, 2025
* Improve formatting tutorial 8

* Visualize rollouts with different rewards.

* Make human-replay agents slightly darker

* Add option for storing behavioral metrics

* WIP

* Analyze agent diversity

* Analyze agent diversity v2

* Full diversity analysis

* Merge in main

* wip

* Sync

* Implement waypoint following agent

* WIP

* Set defaults

* fix network

* Small setting updates

* Improve and extend options for waypoint following rewards

* Eval new model

* Formatting

* Set default agent_type for fixed condition mode

* minor

* Apply reward weight sharing across environments for memory efficiency

* Add condition mode to wrapper

* Reduce max road points for sim speed up

* Add agent with separate actor and critic network

* Bug fix: checkpointing

* wip

* Set training defaults to best params

* Add separate waypoint following agent

* Merge dev into branch

* Fix waypoint following implementation

* Sbatch

* Can successful learn  waypoint following agent

* Add goal state to ego state by default, so that agents know when the goal is reached.

* Increase log window size

* Remove the goal reward when following waypoints

* Set roadpoints to default to avoid switching

* Implement reference path in reward and observation

* Bug fixes

* Working kinematic metrics

* Minor

* Make logging realism metrics optional

* WIP

* Add simple agent

* Update settings

* Add condition in dones such tthat agents are not allowed to terminate before the log end

* Update realism metrics and support for adding the reference speed

* Update realism metrics and support for adding the reference speed

* Settings

* Add average displacement error

* Set reward for reaching the goal

* Minor

* New defaults

* Bug fix: Zero-out the waypoint distance computations for time steps where the reference logs are invalid.

* More stable realism metrics by averaging over larger batches

* Controll all agent types by default

* Add option for jerk penalties

* Add option for jerk penalties

* Change: Agents cannot be terminated before end of episode length

* Remove distance to last expert position from ego state

* Update number of ego state constants accordingly

* Update visualizer to match new conditions

* Batch global -> local reference frame transformation

* typo

* Minor logging fix

* New defaults

* Replace jerk with single param

* Condition on previous action if present

* Name change

* Faster resets

* Cleanup

* Integrate fb

* Fix all reference-path-related bugs

* Formatting

* Useful debug notebook

* Better default

* Decrease steering angle ub from pi to pi/3

* Add agent obs to logging

* Linting

* Set group

* Fix config
daphne-cornelisse added a commit that referenced this pull request Apr 26, 2025
* inital commit for wosac eval

* Force data processing to be on CPU

* Add readme

* fix the data extraction script

* Force jax run on cpu only

* Fixes and add data processing file

* add wosac original eval script

* agent init fix?

* Feat/vbd amortize (#409)

* raw untested changes

* cleanup artifact

* tensor conversion fix

* working amortization script

* amortization fixes

* replace examples with vbd

* fixes

* revert to 32 agents

* variable agent count

* amortized womd agents

* Improved reward conditioning and waypoint following support (#391)

* Improve formatting tutorial 8

* Visualize rollouts with different rewards.

* Make human-replay agents slightly darker

* Add option for storing behavioral metrics

* WIP

* Analyze agent diversity

* Analyze agent diversity v2

* Full diversity analysis

* Merge in main

* wip

* Sync

* Implement waypoint following agent

* WIP

* Set defaults

* fix network

* Small setting updates

* Improve and extend options for waypoint following rewards

* Eval new model

* Formatting

* Set default agent_type for fixed condition mode

* minor

* Apply reward weight sharing across environments for memory efficiency

* Add condition mode to wrapper

* Reduce max road points for sim speed up

* Add agent with separate actor and critic network

* Bug fix: checkpointing

* wip

* Set training defaults to best params

* Add separate waypoint following agent

* Merge dev into branch

* Fix waypoint following implementation

* Sbatch

* Can successful learn  waypoint following agent

* Add goal state to ego state by default, so that agents know when the goal is reached.

* Increase log window size

* Remove the goal reward when following waypoints

* Set roadpoints to default to avoid switching

* Implement reference path in reward and observation

* Bug fixes

* Working kinematic metrics

* Minor

* Make logging realism metrics optional

* WIP

* Add simple agent

* Update settings

* Add condition in dones such tthat agents are not allowed to terminate before the log end

* Update realism metrics and support for adding the reference speed

* Update realism metrics and support for adding the reference speed

* Settings

* Add average displacement error

* Set reward for reaching the goal

* Minor

* New defaults

* Bug fix: Zero-out the waypoint distance computations for time steps where the reference logs are invalid.

* More stable realism metrics by averaging over larger batches

* Controll all agent types by default

* Add option for jerk penalties

* Add option for jerk penalties

* Change: Agents cannot be terminated before end of episode length

* Remove distance to last expert position from ego state

* Update number of ego state constants accordingly

* Update visualizer to match new conditions

* Batch global -> local reference frame transformation

* typo

* Minor logging fix

* New defaults

* Replace jerk with single param

* Condition on previous action if present

* Name change

* Faster resets

* Cleanup

* Integrate fb

* Fix all reference-path-related bugs

* Formatting

* Useful debug notebook

* Better default

* Decrease steering angle ub from pi to pi/3

* Add agent obs to logging

* Linting

* Set group

* Fix config

* partial fix

* inverse action fix

* warmup impl

* womd init fix

* cleanup

---------

Co-authored-by: Zixu Zhang <zixuz@princeton.edu>
Co-authored-by: Daphne Cornelisse <cor.daphne@gmail.com>
Co-authored-by: Zixu Zhang <zixu@umich.edu>
Co-authored-by: Daphne Cornelisse <33460159+daphne-cornelisse@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants