Skip to content

Fix InPlaceRestartMember: add run() method and remove EnsembleManager from restart ensemble#3

Draft
Copilot wants to merge 2 commits into
copilot/replace-restart-member-to-inplacefrom
copilot/fix-inplace-restart-member
Draft

Fix InPlaceRestartMember: add run() method and remove EnsembleManager from restart ensemble#3
Copilot wants to merge 2 commits into
copilot/replace-restart-member-to-inplacefrom
copilot/fix-inplace-restart-member

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 27, 2026

InPlaceRestartMember had no run() method, so compass run on the restart ensemble called the inherited no-op Step.run() — MALI never started. Additionally, steps_to_run = ['ensemble_manager'] meant only EnsembleManager executed, bypassing the restart member steps entirely.

restart_member.py

  • Add from compass.model import run_model import
  • Call self.add_model_as_input() in setup() so compass registers the MALI executable
  • Add run() method that calls run_model(self) to execute MALI in the original run directory

test_case.py

  • Remove EnsembleManager import and self.add_step(EnsembleManager(...)) from __init__
  • Remove self.steps_to_run = ['ensemble_manager'] from configure() so all registered InPlaceRestartMember steps execute directly via compass run

Checklist

  • User's Guide has been updated
  • Developer's Guide has been updated
  • API documentation in the Developer's Guide (api.rst) has any new or modified class, method and/or functions listed
  • Documentation has been built locally and changes look as expected
  • The E3SM-Project submodule has been updated with relevant E3SM changes
  • The MALI-Dev submodule has been updated with relevant MALI changes
  • Document (in a comment titled Testing in this PR) any testing that was used to verify the changes
  • New tests have been added to a test suite
Original prompt

Problem

InPlaceRestartMember in compass/landice/tests/ensemble_generator/sgh_restart_ensemble/restart_member.py has no run() method. When compass run executes the restart ensemble, it calls the inherited no-op Step.run(), which does nothing — MALI never starts.

Additionally, test_case.py sets self.steps_to_run = ['ensemble_manager'] in configure(), which means compass only executes EnsembleManager (which tries to submit sbatch jobs), and never directly executes the InPlaceRestartMember steps. The desired behaviour is for compass run to directly run each InPlaceRestartMember step, which sets config_do_restart = .true. in namelist.landice and then executes MALI.


Changes required

1. compass/landice/tests/ensemble_generator/sgh_restart_ensemble/restart_member.py

Two additions:

In setup(), call self.add_model_as_input() after _set_restart_in_namelist(namelist_path) so that compass registers the MALI executable as an input for this step:

    def setup(self):
        run_dir = os.path.join(self.spinup_work_dir, f'run{self.run_num:03}')

        if not os.path.exists(run_dir):
            raise RuntimeError(
                f"Original run directory not found: {run_dir}")

        namelist_path = os.path.join(run_dir, 'namelist.landice')

        if not os.path.exists(namelist_path):
            raise RuntimeError(
                f"namelist.landice not found in {run_dir}")

        print(f'Setting config_do_restart = .true. in {namelist_path}')
        _set_restart_in_namelist(namelist_path)

        # Register MALI executable so compass knows this step needs the model
        self.add_model_as_input()

Add a run() method that calls run_model(self):

    def run(self):
        """
        Run MALI in the original run directory to continue the simulation
        from its last checkpoint.
        """
        from compass.model import run_model
        run_model(self)

The import of run_model should be added at the top of the file alongside the existing imports:

from compass.model import run_model

2. compass/landice/tests/ensemble_generator/sgh_restart_ensemble/test_case.py

Remove or replace the line:

        # Only run ensemble_manager; it submits individual restart jobs
        self.steps_to_run = ['ensemble_manager']

Instead, steps_to_run should include all restart member steps (plus optionally ensemble_manager if you still want sbatch submission as well, but the primary path is direct execution). The simplest fix is to remove the steps_to_run assignment entirely so that compass runs all registered steps in order (first ensemble_manager setup via __init__, then each InPlaceRestartMember).

However, EnsembleManager is added in __init__ and is not useful for the direct-execution path. The cleanest solution is:

  • Remove self.add_step(EnsembleManager(test_case=self)) from __init__
  • Remove self.steps_to_run = ['ensemble_manager'] from configure()
  • Remove the EnsembleManager import since it's no longer needed

This leaves steps_to_run unset (i.e., all registered steps run), which will be only the InPlaceRestartMember steps added in configure().

The resulting __init__ should be:

    def __init__(self, test_group):
        name = 'sgh_restart_ensemble'
        super().__init__(test_group=test_group, name=name)

And the end of configure() should be (remove the steps_to_run line):

        self.restart_run_numbers = restart_runs
        # steps_to_run is not set; compass will run all InPlaceRestartMember steps

Also remove the now-unused import:

from compass.landice.tests.ensemble_generator.ensemble_manager import (
    EnsembleManager,
)

File locations (on branch copilot/replace-restart-member-to-inplace)

  • compass/landice/tests/ensemble_generator/sgh_restart_ensemble/restart_member.py — BlobSha: e459d206918746d1f1891beb2099a3bb43677788
  • compass/landice/tests/ensemble_generator/sgh_restart_ensemble/test_case.py — BlobSha: 16388c1dd0557165679ffb8e573a5c6f49678677

Current content of restart_member.py:

"""
Step for restarting a single incomplete ensemble member in-place.
"""

import os

from compass.step import Step


def _set_restart_in_namelist(namelist_path):
    """Set config_do_restart = .true. in-place in namelist.landice."""
    with open(namelist_path, 'r') as f:
        lines = f.readlines()

    updated = False
    new_lines = []
    for line in lines:
        stripped = line.strip()
        if stripped.startswith('config_do_restart'):
            new_lines.append(
                line[:line.index('config_do_restart')]
                + 'config_do_restart = .true.\n'
            )
            updated = True
        else:
            new_lines.append(line)

    if not updated:
        new_lines.append('\nconfig_do_restart = .true.\n')

    with open(namelist_pa...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. [Learn more about Advanced Security.](https://gh.io/cca-advanced-security)

…rom restart ensemble

Agent-Logs-Url: https://github.com/alexolinhager/compass/sessions/4376fadc-c216-46c7-a876-0734d8f3df11

Co-authored-by: alexolinhager <131483939+alexolinhager@users.noreply.github.com>
Copilot AI changed the title [WIP] Add run method to InPlaceRestartMember to enable MALI execution Fix InPlaceRestartMember: add run() method and remove EnsembleManager from restart ensemble Mar 27, 2026
Copilot AI requested a review from alexolinhager March 27, 2026 20:53
Copilot AI added a commit that referenced this pull request Apr 1, 2026
Bug #1: config.get('restart_ensemble', {}) crashes because MpasConfigParser.get()
expects (section, option) positional args, not a dict fallback.
Fixed: config['restart_ensemble'] returns a SectionProxy with proper
.get()/.getint()/.getfloat()/.getboolean() methods.

Bug #2: _should_restart_run() looked for per-run analysis_results.json files
that are never written.  AnalysisStep writes analysis_summary.json to its
own work dir containing an individual_results dict for all runs.
Fixed: add analysis_summary_file config option; configure() loads the file
and passes per-run dicts to _should_restart_run() via a new run_results param.
RestartScheduler.create_config_file() now includes analysis_summary_file in
generated configs.

Bug #3: restart_attempt_N/ tracking dirs were never created by
InPlaceRestartMember.setup(), so max_consecutive_restarts was effectively
disabled and all attempt counters read 0.
Fixed: setup() now creates restart_attempt_N/ dirs using a single os.listdir()
call to find the highest existing attempt number.

Bug #5: restart_scheduler.py docstring Examples section referenced a
non-existent module path. Fixed to the correct path.

Agent-Logs-Url: https://github.com/alexolinhager/compass/sessions/ca2d29bf-1246-415c-bf2c-9de7521fa55f

Co-authored-by: alexolinhager <131483939+alexolinhager@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants