Fix InPlaceRestartMember: add run() method and remove EnsembleManager from restart ensemble#3
Draft
Copilot wants to merge 2 commits into
Conversation
…rom restart ensemble Agent-Logs-Url: https://github.com/alexolinhager/compass/sessions/4376fadc-c216-46c7-a876-0734d8f3df11 Co-authored-by: alexolinhager <131483939+alexolinhager@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add run method to InPlaceRestartMember to enable MALI execution
Fix InPlaceRestartMember: add run() method and remove EnsembleManager from restart ensemble
Mar 27, 2026
Copilot AI
added a commit
that referenced
this pull request
Apr 1, 2026
Bug #1: config.get('restart_ensemble', {}) crashes because MpasConfigParser.get() expects (section, option) positional args, not a dict fallback. Fixed: config['restart_ensemble'] returns a SectionProxy with proper .get()/.getint()/.getfloat()/.getboolean() methods. Bug #2: _should_restart_run() looked for per-run analysis_results.json files that are never written. AnalysisStep writes analysis_summary.json to its own work dir containing an individual_results dict for all runs. Fixed: add analysis_summary_file config option; configure() loads the file and passes per-run dicts to _should_restart_run() via a new run_results param. RestartScheduler.create_config_file() now includes analysis_summary_file in generated configs. Bug #3: restart_attempt_N/ tracking dirs were never created by InPlaceRestartMember.setup(), so max_consecutive_restarts was effectively disabled and all attempt counters read 0. Fixed: setup() now creates restart_attempt_N/ dirs using a single os.listdir() call to find the highest existing attempt number. Bug #5: restart_scheduler.py docstring Examples section referenced a non-existent module path. Fixed to the correct path. Agent-Logs-Url: https://github.com/alexolinhager/compass/sessions/ca2d29bf-1246-415c-bf2c-9de7521fa55f Co-authored-by: alexolinhager <131483939+alexolinhager@users.noreply.github.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
InPlaceRestartMemberhad norun()method, socompass runon the restart ensemble called the inherited no-opStep.run()— MALI never started. Additionally,steps_to_run = ['ensemble_manager']meant onlyEnsembleManagerexecuted, bypassing the restart member steps entirely.restart_member.pyfrom compass.model import run_modelimportself.add_model_as_input()insetup()so compass registers the MALI executablerun()method that callsrun_model(self)to execute MALI in the original run directorytest_case.pyEnsembleManagerimport andself.add_step(EnsembleManager(...))from__init__self.steps_to_run = ['ensemble_manager']fromconfigure()so all registeredInPlaceRestartMembersteps execute directly viacompass runChecklist
api.rst) has any new or modified class, method and/or functions listedE3SM-Projectsubmodule has been updated with relevant E3SM changesMALI-Devsubmodule has been updated with relevant MALI changesTestingin this PR) any testing that was used to verify the changesOriginal prompt
Problem
InPlaceRestartMemberincompass/landice/tests/ensemble_generator/sgh_restart_ensemble/restart_member.pyhas norun()method. Whencompass runexecutes the restart ensemble, it calls the inherited no-opStep.run(), which does nothing — MALI never starts.Additionally,
test_case.pysetsself.steps_to_run = ['ensemble_manager']inconfigure(), which means compass only executesEnsembleManager(which tries to submitsbatchjobs), and never directly executes theInPlaceRestartMembersteps. The desired behaviour is forcompass runto directly run eachInPlaceRestartMemberstep, which setsconfig_do_restart = .true.innamelist.landiceand then executes MALI.Changes required
1.
compass/landice/tests/ensemble_generator/sgh_restart_ensemble/restart_member.pyTwo additions:
In
setup(), callself.add_model_as_input()after_set_restart_in_namelist(namelist_path)so that compass registers the MALI executable as an input for this step:Add a
run()method that callsrun_model(self):The import of
run_modelshould be added at the top of the file alongside the existing imports:2.
compass/landice/tests/ensemble_generator/sgh_restart_ensemble/test_case.pyRemove or replace the line:
Instead,
steps_to_runshould include all restart member steps (plus optionallyensemble_managerif you still want sbatch submission as well, but the primary path is direct execution). The simplest fix is to remove thesteps_to_runassignment entirely so that compass runs all registered steps in order (firstensemble_managersetup via__init__, then eachInPlaceRestartMember).However,
EnsembleManageris added in__init__and is not useful for the direct-execution path. The cleanest solution is:self.add_step(EnsembleManager(test_case=self))from__init__self.steps_to_run = ['ensemble_manager']fromconfigure()EnsembleManagerimport since it's no longer neededThis leaves
steps_to_rununset (i.e., all registered steps run), which will be only theInPlaceRestartMembersteps added inconfigure().The resulting
__init__should be:And the end of
configure()should be (remove thesteps_to_runline):Also remove the now-unused import:
File locations (on branch
copilot/replace-restart-member-to-inplace)compass/landice/tests/ensemble_generator/sgh_restart_ensemble/restart_member.py— BlobSha:e459d206918746d1f1891beb2099a3bb43677788compass/landice/tests/ensemble_generator/sgh_restart_ensemble/test_case.py— BlobSha:16388c1dd0557165679ffb8e573a5c6f49678677Current content of
restart_member.py: