Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
c07d855
Improve formatting tutorial 8
daphne-cornelisse Mar 20, 2025
f1dedc0
Visualize rollouts with different rewards.
daphne-cornelisse Mar 20, 2025
e993164
Make human-replay agents slightly darker
daphne-cornelisse Mar 20, 2025
1806498
Add option for storing behavioral metrics
daphne-cornelisse Mar 20, 2025
0c00526
WIP
daphne-cornelisse Mar 20, 2025
2a08793
Analyze agent diversity
daphne-cornelisse Mar 21, 2025
a3d0543
Analyze agent diversity v2
daphne-cornelisse Mar 21, 2025
d1fab12
Full diversity analysis
daphne-cornelisse Mar 21, 2025
3871209
Merge in main
daphne-cornelisse Mar 31, 2025
6dcd690
Merge in main
daphne-cornelisse Mar 31, 2025
d211756
wip
daphne-cornelisse Apr 1, 2025
feacdf7
Sync
daphne-cornelisse Apr 1, 2025
3e6d0f2
Merge branch 'main' of https://github.com/Emerge-Lab/gpudrive into dc…
daphne-cornelisse Apr 1, 2025
8dae37f
Merge branch 'dc/reward_conditioning' of https://github.com/Emerge-La…
daphne-cornelisse Apr 1, 2025
122e0f8
Implement waypoint following agent
daphne-cornelisse Apr 1, 2025
e8c1bd2
WIP
daphne-cornelisse Apr 3, 2025
8c62a2b
Merge dev into dc/reward_conditioning
daphne-cornelisse Apr 4, 2025
24ce94e
Set defaults
daphne-cornelisse Apr 4, 2025
1175706
fix network
daphne-cornelisse Apr 4, 2025
20500ef
Small setting updates
daphne-cornelisse Apr 5, 2025
41235ca
Improve and extend options for waypoint following rewards
daphne-cornelisse Apr 5, 2025
f283975
Eval new model
daphne-cornelisse Apr 7, 2025
4939079
Formatting
daphne-cornelisse Apr 7, 2025
f0d66d5
Set default agent_type for fixed condition mode
daphne-cornelisse Apr 7, 2025
d63f370
minor
daphne-cornelisse Apr 7, 2025
d59826a
Apply reward weight sharing across environments for memory efficiency
daphne-cornelisse Apr 7, 2025
3803945
Add condition mode to wrapper
daphne-cornelisse Apr 7, 2025
a95e41f
Reduce max road points for sim speed up
daphne-cornelisse Apr 8, 2025
b1a2c53
Add agent with separate actor and critic network
daphne-cornelisse Apr 8, 2025
6e3f257
Bug fix: checkpointing
daphne-cornelisse Apr 8, 2025
57b3ac4
wip
daphne-cornelisse Apr 8, 2025
e283c29
Set training defaults to best params
daphne-cornelisse Apr 9, 2025
469c380
Merge branch 'dc/reward_conditioning' of https://github.com/Emerge-La…
daphne-cornelisse Apr 9, 2025
6a23e2e
Add separate waypoint following agent
daphne-cornelisse Apr 9, 2025
0eefe13
Merge dev into branch
daphne-cornelisse Apr 9, 2025
196fb5f
Merge dev into branch
daphne-cornelisse Apr 9, 2025
d15363d
Fix waypoint following implementation
daphne-cornelisse Apr 10, 2025
08513ec
Sbatch
daphne-cornelisse Apr 10, 2025
cfa2081
Can successful learn waypoint following agent
daphne-cornelisse Apr 10, 2025
b3dda6a
Merge branch 'dc/reward_conditioning' of https://github.com/Emerge-La…
daphne-cornelisse Apr 10, 2025
6ba22ae
Add goal state to ego state by default, so that agents know when the …
daphne-cornelisse Apr 10, 2025
3acdec8
Increase log window size
daphne-cornelisse Apr 10, 2025
4e2a256
Remove the goal reward when following waypoints
daphne-cornelisse Apr 12, 2025
84a139b
Set roadpoints to default to avoid switching
daphne-cornelisse Apr 12, 2025
37ebb56
Implement reference path in reward and observation
daphne-cornelisse Apr 14, 2025
ddce91c
Bug fixes
daphne-cornelisse Apr 14, 2025
3248fc9
Working kinematic metrics
daphne-cornelisse Apr 15, 2025
58c2ef0
Minor
daphne-cornelisse Apr 15, 2025
565b63e
Make logging realism metrics optional
daphne-cornelisse Apr 16, 2025
97aa5a1
WIP
daphne-cornelisse Apr 16, 2025
78a363d
Add simple agent
daphne-cornelisse Apr 16, 2025
1886bc0
Update settings
daphne-cornelisse Apr 16, 2025
c844ec6
Add condition in dones such tthat agents are not allowed to terminate…
daphne-cornelisse Apr 16, 2025
f50589a
Update realism metrics and support for adding the reference speed
daphne-cornelisse Apr 17, 2025
bae9eef
Update realism metrics and support for adding the reference speed
daphne-cornelisse Apr 17, 2025
d936dea
Settings
daphne-cornelisse Apr 17, 2025
caa953c
Add average displacement error
daphne-cornelisse Apr 17, 2025
cb0a26b
Set reward for reaching the goal
daphne-cornelisse Apr 17, 2025
61486d0
Minor
daphne-cornelisse Apr 17, 2025
b4b5de3
New defaults
daphne-cornelisse Apr 17, 2025
8ce9df0
Bug fix: Zero-out the waypoint distance computations for time steps w…
daphne-cornelisse Apr 18, 2025
03e8d2b
More stable realism metrics by averaging over larger batches
daphne-cornelisse Apr 18, 2025
8f785fb
Controll all agent types by default
daphne-cornelisse Apr 18, 2025
298fbdb
Add option for jerk penalties
daphne-cornelisse Apr 18, 2025
ccc3ded
Add option for jerk penalties
daphne-cornelisse Apr 18, 2025
9517126
Change: Agents cannot be terminated before end of episode length
daphne-cornelisse Apr 19, 2025
eeecb2c
Remove distance to last expert position from ego state
daphne-cornelisse Apr 19, 2025
180fe76
Update number of ego state constants accordingly
daphne-cornelisse Apr 19, 2025
0da58ff
Update visualizer to match new conditions
daphne-cornelisse Apr 19, 2025
56032d2
Batch global -> local reference frame transformation
daphne-cornelisse Apr 19, 2025
2eccb0a
typo
daphne-cornelisse Apr 19, 2025
1f0c2b9
Minor logging fix
daphne-cornelisse Apr 19, 2025
7266b5d
New defaults
daphne-cornelisse Apr 19, 2025
325432d
Replace jerk with single param
daphne-cornelisse Apr 19, 2025
012a547
Condition on previous action if present
daphne-cornelisse Apr 19, 2025
8bac16f
Name change
daphne-cornelisse Apr 19, 2025
670d221
Faster resets
daphne-cornelisse Apr 19, 2025
da8a001
Cleanup
daphne-cornelisse Apr 21, 2025
685d95f
Integrate fb
daphne-cornelisse Apr 21, 2025
0960be8
Fix all reference-path-related bugs
daphne-cornelisse Apr 22, 2025
1c68294
Formatting
daphne-cornelisse Apr 22, 2025
f0f3314
Useful debug notebook
daphne-cornelisse Apr 22, 2025
2af59ae
Better default
daphne-cornelisse Apr 22, 2025
8fe1c0e
Decrease steering angle ub from pi to pi/3
daphne-cornelisse Apr 22, 2025
641cfee
Add agent obs to logging
daphne-cornelisse Apr 23, 2025
252343e
Merge remote-tracking branch 'origin/dev' into dc/reward_conditioning
daphne-cornelisse Apr 23, 2025
53697ba
Linting
daphne-cornelisse Apr 23, 2025
99bc10b
Set group
daphne-cornelisse Apr 23, 2025
0b03ae7
Fix config
daphne-cornelisse Apr 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 0 additions & 13 deletions .env.template

This file was deleted.

4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,11 @@ data/raw/*
data/processed/validation/*
data/processed/training/*
data/processed/testing/*
data/processed/sampled/*
data/processed/pop_play/*
data/processed/hand_designed/*
analyze/figures/*
data/other/*
wosac/

# Logging
/wandb
Expand Down
31 changes: 21 additions & 10 deletions baselines/ppo/config/ppo_base_puffer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,31 @@ model_cpt: null

environment: # Overrides default environment configs (see pygpudrive/env/config.py)
name: "gpudrive"
num_worlds: 75 # Number of parallel environments
k_unique_scenes: 75 # Number of unique scenes to sample from
num_worlds: 100 # Number of parallel environments
k_unique_scenes: 100 # Number of unique scenes to sample from
max_controlled_agents: 64 # Maximum number of agents controlled by the model. Make sure this aligns with the variable kMaxAgentCount in src/consts.hpp
ego_state: true
road_map_obs: true
partner_obs: true
norm_obs: true
add_goal_state: true # If true, the goal state is added to the ego observation
add_reference_path: false
remove_non_vehicles: false # If false, all agents are included (vehicles, pedestrians, cyclists)
lidar_obs: false # NOTE: Setting this to true currently turns of the other observation types
reward_type: "weighted_combination"
reward_type: "weighted_combination" # Options: "weighted_combination", "reward_conditioned"
collision_weight: -0.75
off_road_weight: -0.75
goal_achieved_weight: 1.0
init_mode: all_non_trivial

# If reward_type is "reward_conditioned", the following parameters are used
condition_mode: random
Comment thread
daphne-cornelisse marked this conversation as resolved.
collision_weight_lb: -3.0
collision_weight_ub: 0.01
goal_achieved_weight_lb: 1.0
goal_achieved_weight_ub: 3.0
off_road_weight_lb: -3.0
off_road_weight_ub: 0.0

dynamics_model: "classic"
collision_behavior: "ignore" # Options: "remove", "stop", "ignore"
goal_behavior: "remove" # Options: "remove", "stop", "ignore"
Expand All @@ -39,8 +50,8 @@ environment: # Overrides default environment configs (see pygpudrive/env/config.

wandb:
entity: ""
project: "clean_tests"
group: " "
project: "gpudrive"
group: ""
mode: "online" # Options: online, offline, disabled
tags: ["ppo", "ff"]

Expand All @@ -54,16 +65,16 @@ train:
compile_mode: "reduce-overhead"

# # # Data sampling # # #
resample_scenes: false
resample_scenes: true
resample_dataset_size: 10_000 # Number of unique scenes to sample from
resample_interval: 2_000_000
sample_with_replacement: true
shuffle_dataset: false

# # # PPO # # #
torch_deterministic: false
total_timesteps: 1_000_000_000
batch_size: 131_072
total_timesteps: 2_000_000_000
batch_size: 262_144
minibatch_size: 8192
learning_rate: 3e-4
anneal_lr: false
Expand All @@ -89,7 +100,7 @@ train:
num_parameters: 0 # Total trainable parameters, to be filled at runtime

# # # Checkpointing # # #
checkpoint_interval: 400 # Save policy every k iterations
checkpoint_interval: 500 # Save policy every k iterations
checkpoint_path: "./runs"

# # # Rendering # # #
Expand Down
119 changes: 119 additions & 0 deletions baselines/ppo/config/ppo_population.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
mode: "train"
use_rnn: false
eval_model_path: null
baseline: false
data_dir: data/processed/pop_play
continue_training: false
model_cpt: null

environment: # Overrides default environment configs (see pygpudrive/env/config.py)
name: "gpudrive"
num_worlds: 100 # Number of parallel environments
k_unique_scenes: 100 # Number of unique scenes to sample from
max_controlled_agents: 64 # Maximum number of agents controlled by the model. Make sure this aligns with the variable kMaxAgentCount in src/consts.hpp
ego_state: true
road_map_obs: true
partner_obs: true
norm_obs: true
remove_non_vehicles: false # If false, all agents are included (vehicles, pedestrians, cyclists)
lidar_obs: false # NOTE: Setting this to true currently turns of the other observation types
reward_type: "reward_conditioned" # Options: "weighted_combination", "reward_conditioned", "follow_waypoints"
collision_weight: -0.75
off_road_weight: -0.75
goal_achieved_weight: 1.0
init_mode: all_non_trivial

# If reward_type is "reward_conditioned", the following parameters are used
randomize_rewards: true
condition_mode: random # Options: random, fixed
collision_weight_lb: -3.0
collision_weight_ub: 0.0
goal_achieved_weight_lb: 1.0
goal_achieved_weight_ub: 3.0
off_road_weight_lb: -3.0
off_road_weight_ub: 0.0

dynamics_model: "classic"
collision_behavior: "ignore" # Options: "remove", "stop", "ignore"
dist_to_goal_threshold: 2.0
polyline_reduction_threshold: 0.1 # Rate at which to sample points from the polyline (0 is use all closest points, 1 maximum sparsity), needs to be balanced with kMaxAgentMapObservationsCount
sampling_seed: 42 # If given, the set of scenes to sample from will be deterministic, if None, the set of scenes will be random
obs_radius: 50.0 # Visibility radius of the agents
action_space_steer_disc: 13
action_space_accel_disc: 7
init_steps: 0 # Warmup steps
# Versatile Behavior Diffusion (VBD): This will slow down training
use_vbd: false
vbd_model_path: "gpudrive/integrations/vbd/weights/epoch=18.ckpt"
vbd_trajectory_weight: 0.1 # Importance of distance to the vbd trajectories in the reward function
vbd_in_obs: false

wandb:
entity: ""
project: "kshotagents"
group: "separate_actor_critic"
mode: "online" # Options: online, offline, disabled
tags: ["ppo", "ff"]

train:
exp_id: # Set dynamically in the script if needed
seed: 42
cpu_offload: false
device: "cuda" # Dynamically set to cuda if available, else cpu
bptt_horizon: 1
compile: false
compile_mode: "reduce-overhead"

# # # Data sampling # # #
resample_scenes: false
resample_dataset_size: 500 # Number of unique scenes to sample from
resample_interval: 2_000_000
sample_with_replacement: false
shuffle_dataset: false

# # # PPO # # #
torch_deterministic: false
total_timesteps: 2_000_000_000
batch_size: 131072
minibatch_size: 8192
learning_rate: 3e-4
anneal_lr: true
gamma: 0.99
gae_lambda: 0.95
update_epochs: 4
norm_adv: true
clip_coef: 0.2
clip_vloss: false
vf_clip_coef: 0.2
ent_coef: 0.001
vf_coef: 0.5
max_grad_norm: 0.5
target_kl: null
log_window: 1000

# # # Network # # #
network:
embed_dim: 64 # Embedding of the input features
dropout: 0.01
class_name: "Agent"
num_parameters: 0 # Total trainable parameters, to be filled at runtime

# # # Checkpointing # # #
checkpoint_interval: 250 # Save policy every k iterations
checkpoint_path: "./runs"

# # # Rendering # # #
render: false # Determines whether to render the environment (note: will slow down training)
render_3d: false # Render simulator state in 3d or 2d
render_interval: 50 # Render every k iterations
render_k_scenarios: 1 # Number of scenarios to render
render_format: "mp4" # Options: gif, mp4
render_fps: 20 # Frames per second
zoom_radius: 100
plot_waypoints: true

vec:
backend: "native" # Only native is currently supported
num_workers: 1
env_batch_size: 1
zero_copy: false
122 changes: 122 additions & 0 deletions baselines/ppo/config/ppo_waypoint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
mode: "train"
use_rnn: false
eval_model_path: null
baseline: false
data_dir: data/processed/wosac/validation_json_100
continue_training: false
model_cpt: null
Comment thread
daphne-cornelisse marked this conversation as resolved.

environment: # Overrides default environment configs (see pygpudrive/env/config.py)
name: "gpudrive"
num_worlds: 100 # Number of parallel environments
k_unique_scenes: 100 # Number of unique scenes to sample from
max_controlled_agents: 64 # Maximum number of agents controlled by the model. Make sure this aligns with the variable kMaxAgentCount in src/consts.hpp
ego_state: true
road_map_obs: true
partner_obs: true
norm_obs: true
remove_non_vehicles: false
collision_behavior: "ignore"
goal_behavior: "ignore"
reward_type: "follow_waypoints"
waypoint_distance_scale: 0.01
speed_distance_scale: 0.01
jerk_smoothness_scale: 0.001

init_mode: all_non_trivial #womd_tracks_to_predict
dynamics_model: "classic"
polyline_reduction_threshold: 0.1 # Rate at which to sample points from the polyline (0 is use all closest points, 1 maximum sparsity), needs to be balanced with kMaxAgentMapObservationsCount
sampling_seed: 42 # If given, the set of scenes to sample from will be deterministic, if None, the set of scenes will be random
obs_radius: 50.0 # Visibility radius of the agents
action_space_steer_disc: 15
action_space_accel_disc: 11
init_steps: 0 # Warmup steps
goal_achieved_weight: 0.0
collision_weight: -0.2
off_road_weight: -0.2

# Versatile Behavior Diffusion (VBD)
use_vbd: false
init_steps: 0
vbd_trajectory_weight: 0.1 # Importance of distance to the vbd trajectories in the reward function
vbd_in_obs: false

# Planning guidance
add_reference_path: true # If true, a reference path is added to the ego observation
add_reference_speed: true # If true, the reference speed (scalar) is added to the ego observation
prob_reference_dropout: 0.0 # Value between 0 and 1, probability of a reference point to be zeroed out

wandb:
entity: ""
project: "humanlike"
group: "debug"
mode: "online" # Options: online, offline, disabled
tags: ["ppo", "ff"]

train:
exp_id: waypoint_rs # Set dynamically in the script if needed
seed: 42
cpu_offload: false
device: "cuda" # Dynamically set to cuda if available, else cpu
bptt_horizon: 1
compile: false
compile_mode: "reduce-overhead"

# # # Data sampling # # #
resample_scenes: false
resample_dataset_size: 500 # Number of unique scenes to sample from
resample_interval: 2_000_000
sample_with_replacement: true
shuffle_dataset: true
file_prefix: ""

# # # PPO # # #
torch_deterministic: false
total_timesteps: 2_000_000_000
batch_size: 131072
minibatch_size: 8192
learning_rate: 3e-4
anneal_lr: true
gamma: 0.99
gae_lambda: 0.95
update_epochs: 4
norm_adv: true
clip_coef: 0.2
clip_vloss: false
vf_clip_coef: 0.2
ent_coef: 0.001
vf_coef: 0.5
max_grad_norm: 0.5
target_kl: null

# # # Logging # # #
log_window: 500
track_realism_metrics: true # Log human-like metrics
track_n_worlds: 3 # Number of worlds to track

# # # Network # # #
network:
embed_dim: 64 # Embedding of the input features
dropout: 0.01
class_name: "Agent"
num_parameters: 0 # Total trainable parameters, to be filled at runtime

# # # Checkpointing # # #
checkpoint_interval: 500 # Save policy every k iterations
checkpoint_path: "./runs"

# # # Rendering # # #
render: false # Determines whether to render the environment (note: will slow down training)
render_3d: false # Render simulator state in 3d or 2d
render_interval: 150 # Render every k iterations
render_k_scenarios: 2 # Number of scenarios to render
render_format: "mp4" # Options: gif, mp4
render_fps: 20 # Frames per second
zoom_radius: 100
plot_waypoints: true

vec:
backend: "native" # Only native is currently supported
num_workers: 1
env_batch_size: 1
zero_copy: false
13 changes: 13 additions & 0 deletions baselines/ppo/ppo_pufferlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,11 +161,13 @@ def run(
# fmt: off
# Environment options
num_worlds: Annotated[Optional[int], typer.Option(help="Number of parallel envs")] = None,
max_controlled_agents: Annotated[Optional[int], typer.Option(help="Number of controlled agents")] = None,
k_unique_scenes: Annotated[Optional[int], typer.Option(help="The number of unique scenes to sample")] = None,
collision_weight: Annotated[Optional[float], typer.Option(help="The weight for collision penalty")] = None,
off_road_weight: Annotated[Optional[float], typer.Option(help="The weight for off-road penalty")] = None,
goal_achieved_weight: Annotated[Optional[float], typer.Option(help="The weight for goal-achieved reward")] = None,
dist_to_goal_threshold: Annotated[Optional[float], typer.Option(help="The distance threshold for goal-achieved")] = None,
randomize_rewards: Annotated[Optional[int], typer.Option(help="If reward_type == reward_conditioned, choose the condition_mode; 0 or 1")] = 0,
sampling_seed: Annotated[Optional[int], typer.Option(help="The seed for sampling scenes")] = None,
obs_radius: Annotated[Optional[float], typer.Option(help="The radius for the observation")] = None,
collision_behavior: Annotated[Optional[str], typer.Option(help="The collision behavior; 'ignore' or 'remove'")] = None,
Expand Down Expand Up @@ -200,9 +202,20 @@ def run(
# Load default configs
config = load_config(config_path)

if config.environment.reward_type == "reward_conditioned":
if bool(randomize_rewards):
config.environment.condition_mode = "random"
config.train.exp_id = "random_weights"
else:
config.environment.condition_mode = (
"fixed" # Use the same type for every agent
)
config.train.exp_id = "fixed_weights"

# Override configs with command-line arguments
env_config = {
"num_worlds": num_worlds,
"max_controlled_agents": max_controlled_agents,
"k_unique_scenes": k_unique_scenes,
"collision_weight": collision_weight,
"off_road_weight": off_road_weight,
Expand Down
Loading
Loading