Add NHL94 self-play: asymmetric drill modes, opponent snapshot pool, curriculum integration by Copilot · Pull Request #11 · MatPoliquin/stable-retro-scripts

Copilot · 2026-04-01T12:57:28Z

Implements Milestones 1–4 of the NHL94 self-play plan: side-aware action processing, frozen opponent execution inside the wrapper, terminal zero-sum drill rewards, and checkpoint-pool rotation — all without touching PPO's single-agent interface.

`nhl94_rf.py` — Dedicated finetune reward modes

SelfPlayOffenseFinetune: attack-zone init; terminal +1 on team-1 score, -1 if team-2 secures ≥60 consecutive frames of possession or clears the puck, 0 on timeout
SelfPlayDefenseFinetune: defense-zone init; terminal +1 if team-1 clears past centre ice or holds possession ≥60 frames, -1 on team-2 goal, 0 on timeout
Per-episode frame-counting state (_selfplay_offense_ctrl / _selfplay_defense_ctrl) follows the existing _defensezone_carry pattern; both modes registered in _reward_function_map

`nhl94_obs.py` — Side-aware action state + self-play interface

Replaces the three shared button-debounce fields with a per-side dict so learner and opponent never corrupt each other's state:

self.action_state = {
    "learner": {"b_pressed": False, "c_pressed": False, "slapshot_frames": 0},
    "opponent": {"b_pressed": False, "c_pressed": False, "slapshot_frames": 0},
}

_process_filtered_action(ac, side) / _process_multidiscrete_action(ac, side) extracted as helpers
Removes dead prev_state.Flip() (was unreachable in the reward/obs path)
New public methods: set_opponent_model(path), set_selfplay_role(role), compute_opponent_action(obs), combine_selfplay_actions(learner, opponent)
step() injects the frozen opponent's processed actions when selfplay_enabled=True; team 1 is fixed as the learner throughout Milestone 1

`train_live.py` — Opponent snapshot rotation

New args: --load_opponent_model, --selfplay_snapshot_freq, --selfplay_pool_size
OpponentSnapshotCallback saves a snapshot every N steps, maintains a bounded pool, and samples the next opponent with a 40 % latest / 40 % random historical / 20 % oldest distribution (degrades gracefully for small pool sizes)
LiveTrainer seeds the initial opponent at startup via env_method("set_opponent_model", ...) and attaches the callback when --selfplay is active

`train_curriculum.py`

Added load_opponent_model to PATH_KEYS so relative paths in the curriculum JSON are resolved correctly

`curriculum/nhl94.json`

Appended four alternating self-play phases after the existing subskill curriculum: Offense Finetune 1 → Defense Finetune 1 → Offense Finetune 2 → Defense Finetune 2, each with selfplay: true and snapshot/pool config

… pool rotation, curriculum integration) Agent-Logs-Url: https://github.com/MatPoliquin/stable-retro-scripts/sessions/81585af8-7e24-4899-9d92-ce66beb56d32 Co-authored-by: MatPoliquin <7024551+MatPoliquin@users.noreply.github.com>

Initial plan

128b466

Copilot AI assigned Copilot and MatPoliquin Apr 1, 2026

Copilot started work on behalf of MatPoliquin April 1, 2026 12:57 View session

Copilot AI changed the title ~~[WIP] Add self-play implementation for NHL94 training~~ Add NHL94 self-play: asymmetric drill modes, opponent snapshot pool, curriculum integration Apr 1, 2026

Copilot AI requested a review from MatPoliquin April 1, 2026 13:17

Copilot finished work on behalf of MatPoliquin April 1, 2026 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NHL94 self-play: asymmetric drill modes, opponent snapshot pool, curriculum integration#11

Add NHL94 self-play: asymmetric drill modes, opponent snapshot pool, curriculum integration#11
Copilot wants to merge 2 commits into
mainfrom
copilot/add-nhl94-self-play-implementation

Copilot AI commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nhl94_rf.py — Dedicated finetune reward modes

nhl94_obs.py — Side-aware action state + self-play interface

train_live.py — Opponent snapshot rotation

train_curriculum.py

curriculum/nhl94.json

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 1, 2026 •

edited

Loading

`nhl94_rf.py` — Dedicated finetune reward modes

`nhl94_obs.py` — Side-aware action state + self-play interface

`train_live.py` — Opponent snapshot rotation

`train_curriculum.py`

`curriculum/nhl94.json`