-
Notifications
You must be signed in to change notification settings - Fork 86
Improved reward conditioning and waypoint following support #391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
89 commits
Select commit
Hold shift + click to select a range
c07d855
Improve formatting tutorial 8
daphne-cornelisse f1dedc0
Visualize rollouts with different rewards.
daphne-cornelisse e993164
Make human-replay agents slightly darker
daphne-cornelisse 1806498
Add option for storing behavioral metrics
daphne-cornelisse 0c00526
WIP
daphne-cornelisse 2a08793
Analyze agent diversity
daphne-cornelisse a3d0543
Analyze agent diversity v2
daphne-cornelisse d1fab12
Full diversity analysis
daphne-cornelisse 3871209
Merge in main
daphne-cornelisse 6dcd690
Merge in main
daphne-cornelisse d211756
wip
daphne-cornelisse feacdf7
Sync
daphne-cornelisse 3e6d0f2
Merge branch 'main' of https://github.com/Emerge-Lab/gpudrive into dc…
daphne-cornelisse 8dae37f
Merge branch 'dc/reward_conditioning' of https://github.com/Emerge-La…
daphne-cornelisse 122e0f8
Implement waypoint following agent
daphne-cornelisse e8c1bd2
WIP
daphne-cornelisse 8c62a2b
Merge dev into dc/reward_conditioning
daphne-cornelisse 24ce94e
Set defaults
daphne-cornelisse 1175706
fix network
daphne-cornelisse 20500ef
Small setting updates
daphne-cornelisse 41235ca
Improve and extend options for waypoint following rewards
daphne-cornelisse f283975
Eval new model
daphne-cornelisse 4939079
Formatting
daphne-cornelisse f0d66d5
Set default agent_type for fixed condition mode
daphne-cornelisse d63f370
minor
daphne-cornelisse d59826a
Apply reward weight sharing across environments for memory efficiency
daphne-cornelisse 3803945
Add condition mode to wrapper
daphne-cornelisse a95e41f
Reduce max road points for sim speed up
daphne-cornelisse b1a2c53
Add agent with separate actor and critic network
daphne-cornelisse 6e3f257
Bug fix: checkpointing
daphne-cornelisse 57b3ac4
wip
daphne-cornelisse e283c29
Set training defaults to best params
daphne-cornelisse 469c380
Merge branch 'dc/reward_conditioning' of https://github.com/Emerge-La…
daphne-cornelisse 6a23e2e
Add separate waypoint following agent
daphne-cornelisse 0eefe13
Merge dev into branch
daphne-cornelisse 196fb5f
Merge dev into branch
daphne-cornelisse d15363d
Fix waypoint following implementation
daphne-cornelisse 08513ec
Sbatch
daphne-cornelisse cfa2081
Can successful learn waypoint following agent
daphne-cornelisse b3dda6a
Merge branch 'dc/reward_conditioning' of https://github.com/Emerge-La…
daphne-cornelisse 6ba22ae
Add goal state to ego state by default, so that agents know when the …
daphne-cornelisse 3acdec8
Increase log window size
daphne-cornelisse 4e2a256
Remove the goal reward when following waypoints
daphne-cornelisse 84a139b
Set roadpoints to default to avoid switching
daphne-cornelisse 37ebb56
Implement reference path in reward and observation
daphne-cornelisse ddce91c
Bug fixes
daphne-cornelisse 3248fc9
Working kinematic metrics
daphne-cornelisse 58c2ef0
Minor
daphne-cornelisse 565b63e
Make logging realism metrics optional
daphne-cornelisse 97aa5a1
WIP
daphne-cornelisse 78a363d
Add simple agent
daphne-cornelisse 1886bc0
Update settings
daphne-cornelisse c844ec6
Add condition in dones such tthat agents are not allowed to terminate…
daphne-cornelisse f50589a
Update realism metrics and support for adding the reference speed
daphne-cornelisse bae9eef
Update realism metrics and support for adding the reference speed
daphne-cornelisse d936dea
Settings
daphne-cornelisse caa953c
Add average displacement error
daphne-cornelisse cb0a26b
Set reward for reaching the goal
daphne-cornelisse 61486d0
Minor
daphne-cornelisse b4b5de3
New defaults
daphne-cornelisse 8ce9df0
Bug fix: Zero-out the waypoint distance computations for time steps w…
daphne-cornelisse 03e8d2b
More stable realism metrics by averaging over larger batches
daphne-cornelisse 8f785fb
Controll all agent types by default
daphne-cornelisse 298fbdb
Add option for jerk penalties
daphne-cornelisse ccc3ded
Add option for jerk penalties
daphne-cornelisse 9517126
Change: Agents cannot be terminated before end of episode length
daphne-cornelisse eeecb2c
Remove distance to last expert position from ego state
daphne-cornelisse 180fe76
Update number of ego state constants accordingly
daphne-cornelisse 0da58ff
Update visualizer to match new conditions
daphne-cornelisse 56032d2
Batch global -> local reference frame transformation
daphne-cornelisse 2eccb0a
typo
daphne-cornelisse 1f0c2b9
Minor logging fix
daphne-cornelisse 7266b5d
New defaults
daphne-cornelisse 325432d
Replace jerk with single param
daphne-cornelisse 012a547
Condition on previous action if present
daphne-cornelisse 8bac16f
Name change
daphne-cornelisse 670d221
Faster resets
daphne-cornelisse da8a001
Cleanup
daphne-cornelisse 685d95f
Integrate fb
daphne-cornelisse 0960be8
Fix all reference-path-related bugs
daphne-cornelisse 1c68294
Formatting
daphne-cornelisse f0f3314
Useful debug notebook
daphne-cornelisse 2af59ae
Better default
daphne-cornelisse 8fe1c0e
Decrease steering angle ub from pi to pi/3
daphne-cornelisse 641cfee
Add agent obs to logging
daphne-cornelisse 252343e
Merge remote-tracking branch 'origin/dev' into dc/reward_conditioning
daphne-cornelisse 53697ba
Linting
daphne-cornelisse 99bc10b
Set group
daphne-cornelisse 0b03ae7
Fix config
daphne-cornelisse File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,119 @@ | ||
| mode: "train" | ||
| use_rnn: false | ||
| eval_model_path: null | ||
| baseline: false | ||
| data_dir: data/processed/pop_play | ||
| continue_training: false | ||
| model_cpt: null | ||
|
|
||
| environment: # Overrides default environment configs (see pygpudrive/env/config.py) | ||
| name: "gpudrive" | ||
| num_worlds: 100 # Number of parallel environments | ||
| k_unique_scenes: 100 # Number of unique scenes to sample from | ||
| max_controlled_agents: 64 # Maximum number of agents controlled by the model. Make sure this aligns with the variable kMaxAgentCount in src/consts.hpp | ||
| ego_state: true | ||
| road_map_obs: true | ||
| partner_obs: true | ||
| norm_obs: true | ||
| remove_non_vehicles: false # If false, all agents are included (vehicles, pedestrians, cyclists) | ||
| lidar_obs: false # NOTE: Setting this to true currently turns of the other observation types | ||
| reward_type: "reward_conditioned" # Options: "weighted_combination", "reward_conditioned", "follow_waypoints" | ||
| collision_weight: -0.75 | ||
| off_road_weight: -0.75 | ||
| goal_achieved_weight: 1.0 | ||
| init_mode: all_non_trivial | ||
|
|
||
| # If reward_type is "reward_conditioned", the following parameters are used | ||
| randomize_rewards: true | ||
| condition_mode: random # Options: random, fixed | ||
| collision_weight_lb: -3.0 | ||
| collision_weight_ub: 0.0 | ||
| goal_achieved_weight_lb: 1.0 | ||
| goal_achieved_weight_ub: 3.0 | ||
| off_road_weight_lb: -3.0 | ||
| off_road_weight_ub: 0.0 | ||
|
|
||
| dynamics_model: "classic" | ||
| collision_behavior: "ignore" # Options: "remove", "stop", "ignore" | ||
| dist_to_goal_threshold: 2.0 | ||
| polyline_reduction_threshold: 0.1 # Rate at which to sample points from the polyline (0 is use all closest points, 1 maximum sparsity), needs to be balanced with kMaxAgentMapObservationsCount | ||
| sampling_seed: 42 # If given, the set of scenes to sample from will be deterministic, if None, the set of scenes will be random | ||
| obs_radius: 50.0 # Visibility radius of the agents | ||
| action_space_steer_disc: 13 | ||
| action_space_accel_disc: 7 | ||
| init_steps: 0 # Warmup steps | ||
| # Versatile Behavior Diffusion (VBD): This will slow down training | ||
| use_vbd: false | ||
| vbd_model_path: "gpudrive/integrations/vbd/weights/epoch=18.ckpt" | ||
| vbd_trajectory_weight: 0.1 # Importance of distance to the vbd trajectories in the reward function | ||
| vbd_in_obs: false | ||
|
|
||
| wandb: | ||
| entity: "" | ||
| project: "kshotagents" | ||
| group: "separate_actor_critic" | ||
| mode: "online" # Options: online, offline, disabled | ||
| tags: ["ppo", "ff"] | ||
|
|
||
| train: | ||
| exp_id: # Set dynamically in the script if needed | ||
| seed: 42 | ||
| cpu_offload: false | ||
| device: "cuda" # Dynamically set to cuda if available, else cpu | ||
| bptt_horizon: 1 | ||
| compile: false | ||
| compile_mode: "reduce-overhead" | ||
|
|
||
| # # # Data sampling # # # | ||
| resample_scenes: false | ||
| resample_dataset_size: 500 # Number of unique scenes to sample from | ||
| resample_interval: 2_000_000 | ||
| sample_with_replacement: false | ||
| shuffle_dataset: false | ||
|
|
||
| # # # PPO # # # | ||
| torch_deterministic: false | ||
| total_timesteps: 2_000_000_000 | ||
| batch_size: 131072 | ||
| minibatch_size: 8192 | ||
| learning_rate: 3e-4 | ||
| anneal_lr: true | ||
| gamma: 0.99 | ||
| gae_lambda: 0.95 | ||
| update_epochs: 4 | ||
| norm_adv: true | ||
| clip_coef: 0.2 | ||
| clip_vloss: false | ||
| vf_clip_coef: 0.2 | ||
| ent_coef: 0.001 | ||
| vf_coef: 0.5 | ||
| max_grad_norm: 0.5 | ||
| target_kl: null | ||
| log_window: 1000 | ||
|
|
||
| # # # Network # # # | ||
| network: | ||
| embed_dim: 64 # Embedding of the input features | ||
| dropout: 0.01 | ||
| class_name: "Agent" | ||
| num_parameters: 0 # Total trainable parameters, to be filled at runtime | ||
|
|
||
| # # # Checkpointing # # # | ||
| checkpoint_interval: 250 # Save policy every k iterations | ||
| checkpoint_path: "./runs" | ||
|
|
||
| # # # Rendering # # # | ||
| render: false # Determines whether to render the environment (note: will slow down training) | ||
| render_3d: false # Render simulator state in 3d or 2d | ||
| render_interval: 50 # Render every k iterations | ||
| render_k_scenarios: 1 # Number of scenarios to render | ||
| render_format: "mp4" # Options: gif, mp4 | ||
| render_fps: 20 # Frames per second | ||
| zoom_radius: 100 | ||
| plot_waypoints: true | ||
|
|
||
| vec: | ||
| backend: "native" # Only native is currently supported | ||
| num_workers: 1 | ||
| env_batch_size: 1 | ||
| zero_copy: false |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| mode: "train" | ||
| use_rnn: false | ||
| eval_model_path: null | ||
| baseline: false | ||
| data_dir: data/processed/wosac/validation_json_100 | ||
| continue_training: false | ||
| model_cpt: null | ||
|
daphne-cornelisse marked this conversation as resolved.
|
||
|
|
||
| environment: # Overrides default environment configs (see pygpudrive/env/config.py) | ||
| name: "gpudrive" | ||
| num_worlds: 100 # Number of parallel environments | ||
| k_unique_scenes: 100 # Number of unique scenes to sample from | ||
| max_controlled_agents: 64 # Maximum number of agents controlled by the model. Make sure this aligns with the variable kMaxAgentCount in src/consts.hpp | ||
| ego_state: true | ||
| road_map_obs: true | ||
| partner_obs: true | ||
| norm_obs: true | ||
| remove_non_vehicles: false | ||
| collision_behavior: "ignore" | ||
| goal_behavior: "ignore" | ||
| reward_type: "follow_waypoints" | ||
| waypoint_distance_scale: 0.01 | ||
| speed_distance_scale: 0.01 | ||
| jerk_smoothness_scale: 0.001 | ||
|
|
||
| init_mode: all_non_trivial #womd_tracks_to_predict | ||
| dynamics_model: "classic" | ||
| polyline_reduction_threshold: 0.1 # Rate at which to sample points from the polyline (0 is use all closest points, 1 maximum sparsity), needs to be balanced with kMaxAgentMapObservationsCount | ||
| sampling_seed: 42 # If given, the set of scenes to sample from will be deterministic, if None, the set of scenes will be random | ||
| obs_radius: 50.0 # Visibility radius of the agents | ||
| action_space_steer_disc: 15 | ||
| action_space_accel_disc: 11 | ||
| init_steps: 0 # Warmup steps | ||
| goal_achieved_weight: 0.0 | ||
| collision_weight: -0.2 | ||
| off_road_weight: -0.2 | ||
|
|
||
| # Versatile Behavior Diffusion (VBD) | ||
| use_vbd: false | ||
| init_steps: 0 | ||
| vbd_trajectory_weight: 0.1 # Importance of distance to the vbd trajectories in the reward function | ||
| vbd_in_obs: false | ||
|
|
||
| # Planning guidance | ||
| add_reference_path: true # If true, a reference path is added to the ego observation | ||
| add_reference_speed: true # If true, the reference speed (scalar) is added to the ego observation | ||
| prob_reference_dropout: 0.0 # Value between 0 and 1, probability of a reference point to be zeroed out | ||
|
|
||
| wandb: | ||
| entity: "" | ||
| project: "humanlike" | ||
| group: "debug" | ||
| mode: "online" # Options: online, offline, disabled | ||
| tags: ["ppo", "ff"] | ||
|
|
||
| train: | ||
| exp_id: waypoint_rs # Set dynamically in the script if needed | ||
| seed: 42 | ||
| cpu_offload: false | ||
| device: "cuda" # Dynamically set to cuda if available, else cpu | ||
| bptt_horizon: 1 | ||
| compile: false | ||
| compile_mode: "reduce-overhead" | ||
|
|
||
| # # # Data sampling # # # | ||
| resample_scenes: false | ||
| resample_dataset_size: 500 # Number of unique scenes to sample from | ||
| resample_interval: 2_000_000 | ||
| sample_with_replacement: true | ||
| shuffle_dataset: true | ||
| file_prefix: "" | ||
|
|
||
| # # # PPO # # # | ||
| torch_deterministic: false | ||
| total_timesteps: 2_000_000_000 | ||
| batch_size: 131072 | ||
| minibatch_size: 8192 | ||
| learning_rate: 3e-4 | ||
| anneal_lr: true | ||
| gamma: 0.99 | ||
| gae_lambda: 0.95 | ||
| update_epochs: 4 | ||
| norm_adv: true | ||
| clip_coef: 0.2 | ||
| clip_vloss: false | ||
| vf_clip_coef: 0.2 | ||
| ent_coef: 0.001 | ||
| vf_coef: 0.5 | ||
| max_grad_norm: 0.5 | ||
| target_kl: null | ||
|
|
||
| # # # Logging # # # | ||
| log_window: 500 | ||
| track_realism_metrics: true # Log human-like metrics | ||
| track_n_worlds: 3 # Number of worlds to track | ||
|
|
||
| # # # Network # # # | ||
| network: | ||
| embed_dim: 64 # Embedding of the input features | ||
| dropout: 0.01 | ||
| class_name: "Agent" | ||
| num_parameters: 0 # Total trainable parameters, to be filled at runtime | ||
|
|
||
| # # # Checkpointing # # # | ||
| checkpoint_interval: 500 # Save policy every k iterations | ||
| checkpoint_path: "./runs" | ||
|
|
||
| # # # Rendering # # # | ||
| render: false # Determines whether to render the environment (note: will slow down training) | ||
| render_3d: false # Render simulator state in 3d or 2d | ||
| render_interval: 150 # Render every k iterations | ||
| render_k_scenarios: 2 # Number of scenarios to render | ||
| render_format: "mp4" # Options: gif, mp4 | ||
| render_fps: 20 # Frames per second | ||
| zoom_radius: 100 | ||
| plot_waypoints: true | ||
|
|
||
| vec: | ||
| backend: "native" # Only native is currently supported | ||
| num_workers: 1 | ||
| env_batch_size: 1 | ||
| zero_copy: false | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.