Improved reward conditioning and waypoint following support#391
Merged
Conversation
…/reward_conditioning
…b/gpudrive into dc/reward_conditioning
nadarenator
pushed a commit
that referenced
this pull request
Apr 24, 2025
* Improve formatting tutorial 8 * Visualize rollouts with different rewards. * Make human-replay agents slightly darker * Add option for storing behavioral metrics * WIP * Analyze agent diversity * Analyze agent diversity v2 * Full diversity analysis * Merge in main * wip * Sync * Implement waypoint following agent * WIP * Set defaults * fix network * Small setting updates * Improve and extend options for waypoint following rewards * Eval new model * Formatting * Set default agent_type for fixed condition mode * minor * Apply reward weight sharing across environments for memory efficiency * Add condition mode to wrapper * Reduce max road points for sim speed up * Add agent with separate actor and critic network * Bug fix: checkpointing * wip * Set training defaults to best params * Add separate waypoint following agent * Merge dev into branch * Fix waypoint following implementation * Sbatch * Can successful learn waypoint following agent * Add goal state to ego state by default, so that agents know when the goal is reached. * Increase log window size * Remove the goal reward when following waypoints * Set roadpoints to default to avoid switching * Implement reference path in reward and observation * Bug fixes * Working kinematic metrics * Minor * Make logging realism metrics optional * WIP * Add simple agent * Update settings * Add condition in dones such tthat agents are not allowed to terminate before the log end * Update realism metrics and support for adding the reference speed * Update realism metrics and support for adding the reference speed * Settings * Add average displacement error * Set reward for reaching the goal * Minor * New defaults * Bug fix: Zero-out the waypoint distance computations for time steps where the reference logs are invalid. * More stable realism metrics by averaging over larger batches * Controll all agent types by default * Add option for jerk penalties * Add option for jerk penalties * Change: Agents cannot be terminated before end of episode length * Remove distance to last expert position from ego state * Update number of ego state constants accordingly * Update visualizer to match new conditions * Batch global -> local reference frame transformation * typo * Minor logging fix * New defaults * Replace jerk with single param * Condition on previous action if present * Name change * Faster resets * Cleanup * Integrate fb * Fix all reference-path-related bugs * Formatting * Useful debug notebook * Better default * Decrease steering angle ub from pi to pi/3 * Add agent obs to logging * Linting * Set group * Fix config
daphne-cornelisse
added a commit
that referenced
this pull request
Apr 26, 2025
* inital commit for wosac eval * Force data processing to be on CPU * Add readme * fix the data extraction script * Force jax run on cpu only * Fixes and add data processing file * add wosac original eval script * agent init fix? * Feat/vbd amortize (#409) * raw untested changes * cleanup artifact * tensor conversion fix * working amortization script * amortization fixes * replace examples with vbd * fixes * revert to 32 agents * variable agent count * amortized womd agents * Improved reward conditioning and waypoint following support (#391) * Improve formatting tutorial 8 * Visualize rollouts with different rewards. * Make human-replay agents slightly darker * Add option for storing behavioral metrics * WIP * Analyze agent diversity * Analyze agent diversity v2 * Full diversity analysis * Merge in main * wip * Sync * Implement waypoint following agent * WIP * Set defaults * fix network * Small setting updates * Improve and extend options for waypoint following rewards * Eval new model * Formatting * Set default agent_type for fixed condition mode * minor * Apply reward weight sharing across environments for memory efficiency * Add condition mode to wrapper * Reduce max road points for sim speed up * Add agent with separate actor and critic network * Bug fix: checkpointing * wip * Set training defaults to best params * Add separate waypoint following agent * Merge dev into branch * Fix waypoint following implementation * Sbatch * Can successful learn waypoint following agent * Add goal state to ego state by default, so that agents know when the goal is reached. * Increase log window size * Remove the goal reward when following waypoints * Set roadpoints to default to avoid switching * Implement reference path in reward and observation * Bug fixes * Working kinematic metrics * Minor * Make logging realism metrics optional * WIP * Add simple agent * Update settings * Add condition in dones such tthat agents are not allowed to terminate before the log end * Update realism metrics and support for adding the reference speed * Update realism metrics and support for adding the reference speed * Settings * Add average displacement error * Set reward for reaching the goal * Minor * New defaults * Bug fix: Zero-out the waypoint distance computations for time steps where the reference logs are invalid. * More stable realism metrics by averaging over larger batches * Controll all agent types by default * Add option for jerk penalties * Add option for jerk penalties * Change: Agents cannot be terminated before end of episode length * Remove distance to last expert position from ego state * Update number of ego state constants accordingly * Update visualizer to match new conditions * Batch global -> local reference frame transformation * typo * Minor logging fix * New defaults * Replace jerk with single param * Condition on previous action if present * Name change * Faster resets * Cleanup * Integrate fb * Fix all reference-path-related bugs * Formatting * Useful debug notebook * Better default * Decrease steering angle ub from pi to pi/3 * Add agent obs to logging * Linting * Set group * Fix config * partial fix * inverse action fix * warmup impl * womd init fix * cleanup --------- Co-authored-by: Zixu Zhang <zixuz@princeton.edu> Co-authored-by: Daphne Cornelisse <cor.daphne@gmail.com> Co-authored-by: Zixu Zhang <zixu@umich.edu> Co-authored-by: Daphne Cornelisse <33460159+daphne-cornelisse@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Acknowledging that this should have been multiple PRs, but leaving as is since I'm just merging to dev for now.
@eugenevinitsky I know this is impossible to review, but maybe you can briefly skim the PR? I just self-reviewed it too