Skip to content

goal_on_lane knob — scatter goals within waypoint-spacing range#463

Open
eugenevinitsky wants to merge 6 commits into
emerge/temp_trainingfrom
ev/goal-terminate-mode
Open

goal_on_lane knob — scatter goals within waypoint-spacing range#463
eugenevinitsky wants to merge 6 commits into
emerge/temp_trainingfrom
ev/goal-terminate-mode

Conversation

@eugenevinitsky
Copy link
Copy Markdown

@eugenevinitsky eugenevinitsky commented May 31, 2026

Summary

New goal_on_lane env knob, defaults to True (existing behavior — goals placed along the agent's precomputed route).

When False, each goal is placed at a random drivable point on the map whose Euclidean distance from the previous anchor lies in [min_waypoint_spacing, max_waypoint_spacing]. Anchor is the agent for goal 0, and the previous goal for subsequent goals when num_target_waypoints > 1.

Sampler

pick_random_drivable_position(env, ref_x, ref_y, min_dist, max_dist, *out) does a bounded grid-cell scan around the reference position rather than global rejection sampling:

  • Bounding box around the reference, sized to max_dist + half-cell-diagonal.
  • For each cell in the box whose center is within the bound, iterates the drivable polyline segments stored there.
  • For each segment, samples a uniform point t ∈ [0, 1] along it (segments stored as (start_vertex, end_vertex); the grid stores them in the cell containing the segment midpoint).
  • Reservoir-samples uniformly across all in-range candidate points — O(bbox cells) work, O(1) extra memory.

Continuous-along-segment sampling avoids quantizing candidate positions to lane polyline vertices.

Copilot AI review requested due to automatic review settings May 31, 2026 00:11
@eugenevinitsky eugenevinitsky changed the title drive: add goal_mode and goal_on_lane knobs WIP: drive: add goal_mode and goal_on_lane knobs May 31, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two orthogonal Drive env knobs: goal_mode (continue default vs terminate, controlling episode-end on goal reach) and goal_on_lane (True default vs False, controlling whether goals are placed along the agent's route or scattered at uniformly random drivable points). Both default to the existing behavior.

Changes:

  • Define GOAL_MODE_CONTINUE/GOAL_MODE_TERMINATE, add fields to the Drive struct, branch compute_goals on goal_on_lane, and end episode on first reached goal when in terminate mode.
  • Plumb the two new kwargs from Python through binding.c and export the new int constants from env_binding.h.
  • Validate the new string values in Drive.__init__ and add documented defaults to drive.ini.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pufferlib/ocean/drive/drive.h New mode constants, struct fields, pick_random_drivable_position, scattered branch in compute_goals, and terminate-on-reach block in c_step.
pufferlib/ocean/drive/binding.c Unpacks goal_mode and goal_on_lane kwargs into the env.
pufferlib/ocean/env_binding.h Exports GOAL_MODE_CONTINUE/GOAL_MODE_TERMINATE to Python.
pufferlib/ocean/drive/drive.py New constructor args with validation; passed through _env_init_kwargs.
pufferlib/config/ocean/drive.ini Adds goal_mode and goal_on_lane defaults under [env].

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pufferlib/config/ocean/drive.ini Outdated
Comment on lines +79 to +84
; Episode end on goal reach - options: "continue" (default), "terminate"
goal_mode = "continue"
; True: place goals along the agent's route (existing behavior, on-lane and
; in front of the agent). False: scatter each goal at a uniformly random
; drivable point anywhere on the map.
goal_on_lane = True
@eugenevinitsky eugenevinitsky force-pushed the ev/goal-terminate-mode branch from 60bd5ab to 93cd69b Compare May 31, 2026 21:38
@eugenevinitsky eugenevinitsky changed the title WIP: drive: add goal_mode and goal_on_lane knobs WIP: drive: terminate-on-goal-reach + scattered-goal placement May 31, 2026
@eugenevinitsky eugenevinitsky force-pushed the ev/goal-terminate-mode branch 3 times, most recently from 40ecbcc to f5e3f27 Compare May 31, 2026 21:46
@eugenevinitsky eugenevinitsky changed the title WIP: drive: terminate-on-goal-reach + scattered-goal placement WIP: drive: goal_on_lane knob — scatter goals within waypoint-spacing range May 31, 2026
@eugenevinitsky eugenevinitsky force-pushed the ev/goal-terminate-mode branch 3 times, most recently from e6249d2 to 38e5cc4 Compare May 31, 2026 22:13
@eugenevinitsky eugenevinitsky changed the title WIP: drive: goal_on_lane knob — scatter goals within waypoint-spacing range goal_on_lane knob — scatter goals within waypoint-spacing range May 31, 2026
Eugene Vinitsky and others added 3 commits May 31, 2026 18:18
…e_agent yaml

wandb.init() previously did not set name=, so wandb assigned random names
like "flowing-wind-34". Adds a --run-name CLI flag (top-level config key)
that pufferl's WandbLogger renders against {date} (launch-time YYYY-MM-DD)
and {seed} (args.train.seed) placeholders before passing to wandb.init.

Default is None — wandb auto-name preserved. The single-agent speed-run
yaml opts in by setting run_name: "{date}_seed{seed}", giving identifiable
runs like 2026-05-31_seed0 without per-seed launcher logic.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two orthogonal knobs covering goal placement and episode-end semantics on
goal reach, both defaulting to current behavior:

  goal_on_lane=True (default) / False
    True  -> existing route-based placement (on lane, in front of agent).
    False -> each goal at a uniformly random drivable point anywhere on the
             map, via the new pick_random_drivable_position helper (mirrors
             spawn_agent's lane+geometry pick, sans collision check).

  goal_mode="continue" (default) / "terminate"
    continue  -> existing behavior: reaching a goal advances current_goal_idx;
                 episode keeps running until scenario_length or the inactive
                 threshold trips.
    terminate -> reaching the goal sets terminals[i]=1 for that agent (no
                 truncation flag, so PPO does not bootstrap V); env then
                 add_log + c_reset to advance to the next scenario.

target_type is unchanged -- it still controls obs format (static/dynamic) and
is orthogonal to both new knobs. compute_goals's existing route path is
untouched when goal_on_lane=True.

Files: drive.h struct + defines + compute_goals branch + c_step terminate
hook, env_binding.h exposes GOAL_MODE_* constants, binding.c unpacks both
new kwargs, drive.py validates strings + plumbs through _env_init_kwargs,
drive.ini gives the defaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eugenevinitsky eugenevinitsky force-pushed the ev/goal-terminate-mode branch from 38e5cc4 to 0f5f922 Compare May 31, 2026 22:32
@eugenevinitsky eugenevinitsky force-pushed the ev/goal-terminate-mode branch from ce47759 to a7cec72 Compare June 1, 2026 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants