Skip to content

Add NHL94 self-play: asymmetric drill modes, opponent snapshot pool, curriculum integration#11

Draft
Copilot wants to merge 2 commits into
mainfrom
copilot/add-nhl94-self-play-implementation
Draft

Add NHL94 self-play: asymmetric drill modes, opponent snapshot pool, curriculum integration#11
Copilot wants to merge 2 commits into
mainfrom
copilot/add-nhl94-self-play-implementation

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 1, 2026

Implements Milestones 1–4 of the NHL94 self-play plan: side-aware action processing, frozen opponent execution inside the wrapper, terminal zero-sum drill rewards, and checkpoint-pool rotation — all without touching PPO's single-agent interface.

nhl94_rf.py — Dedicated finetune reward modes

  • SelfPlayOffenseFinetune: attack-zone init; terminal +1 on team-1 score, -1 if team-2 secures ≥60 consecutive frames of possession or clears the puck, 0 on timeout
  • SelfPlayDefenseFinetune: defense-zone init; terminal +1 if team-1 clears past centre ice or holds possession ≥60 frames, -1 on team-2 goal, 0 on timeout
  • Per-episode frame-counting state (_selfplay_offense_ctrl / _selfplay_defense_ctrl) follows the existing _defensezone_carry pattern; both modes registered in _reward_function_map

nhl94_obs.py — Side-aware action state + self-play interface

Replaces the three shared button-debounce fields with a per-side dict so learner and opponent never corrupt each other's state:

self.action_state = {
    "learner": {"b_pressed": False, "c_pressed": False, "slapshot_frames": 0},
    "opponent": {"b_pressed": False, "c_pressed": False, "slapshot_frames": 0},
}
  • _process_filtered_action(ac, side) / _process_multidiscrete_action(ac, side) extracted as helpers
  • Removes dead prev_state.Flip() (was unreachable in the reward/obs path)
  • New public methods: set_opponent_model(path), set_selfplay_role(role), compute_opponent_action(obs), combine_selfplay_actions(learner, opponent)
  • step() injects the frozen opponent's processed actions when selfplay_enabled=True; team 1 is fixed as the learner throughout Milestone 1

train_live.py — Opponent snapshot rotation

  • New args: --load_opponent_model, --selfplay_snapshot_freq, --selfplay_pool_size
  • OpponentSnapshotCallback saves a snapshot every N steps, maintains a bounded pool, and samples the next opponent with a 40 % latest / 40 % random historical / 20 % oldest distribution (degrades gracefully for small pool sizes)
  • LiveTrainer seeds the initial opponent at startup via env_method("set_opponent_model", ...) and attaches the callback when --selfplay is active

train_curriculum.py

  • Added load_opponent_model to PATH_KEYS so relative paths in the curriculum JSON are resolved correctly

curriculum/nhl94.json

  • Appended four alternating self-play phases after the existing subskill curriculum: Offense Finetune 1 → Defense Finetune 1 → Offense Finetune 2 → Defense Finetune 2, each with selfplay: true and snapshot/pool config

… pool rotation, curriculum integration)

Agent-Logs-Url: https://github.com/MatPoliquin/stable-retro-scripts/sessions/81585af8-7e24-4899-9d92-ce66beb56d32

Co-authored-by: MatPoliquin <7024551+MatPoliquin@users.noreply.github.com>
Copilot AI changed the title [WIP] Add self-play implementation for NHL94 training Add NHL94 self-play: asymmetric drill modes, opponent snapshot pool, curriculum integration Apr 1, 2026
Copilot AI requested a review from MatPoliquin April 1, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants