Fix InPlaceRestartMember.setup() work_dir isolation#13
Draft
Copilot wants to merge 2 commits into
Draft
Conversation
…ment Agent-Logs-Url: https://github.com/alexolinhager/compass/sessions/db26aabb-81b4-48b7-8fdf-906b46c00537 Co-authored-by: alexolinhager <131483939+alexolinhager@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix work directory assignment in InPlaceRestartMember
Fix InPlaceRestartMember.setup() work_dir isolation
Apr 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
InPlaceRestartMember.setup()must never reassignself.work_dir—if it did,compass/setup.pywould writestep.pickleinto the spinup run directory containingconfig_filename = sgh_restart_ensemble.cfg, causingcompass runinvoked from that directory to crash withFileNotFoundError: Config file does not exist: .../spinup_ensemble/run002/sgh_restart_ensemble.cfg.Changes
restart_member.py—setup()docstring: Rewritten to explicitly document that the method operates exclusively onself.spinup_run_dir(set in__init__) and never touchesself.work_dir, which compass owns as the step directory. Removed a verbose inline comment made redundant by the updated docstring.__init__has noself.work_diroverrideensemble_manager.pyusesgetattr(runStep, 'spinup_run_dir', runStep.work_dir)foros.chdir()Checklist
api.rst) has any new or modified class, method and/or functions listedE3SM-Projectsubmodule has been updated with relevant E3SM changesMALI-Devsubmodule has been updated with relevant MALI changesTestingin this PR) any testing that was used to verify the changesOriginal prompt
Problem
compass/setup.pysetsstep.work_dirto the correct compass step directory before callingstep.setup(), then writesstep.pickletostep.work_diraftersetup()returns. However,InPlaceRestartMember.setup()reassignsself.work_dirto the spinup run directory at the very start:So by the time
setup.pywritesstep.pickle,step.work_dirpoints to the spinup run dir, and the pickle is written there — containing anInPlaceRestartMemberwithconfig_filename = sgh_restart_ensemble.cfg. When the spinupjob_script.shrunscompass runfrom that directory, it loads this pickle and crashes:Root cause
PR #12 removed the
self.work_diroverride from__init__but missed the identical override at the top ofsetup(). That override insetup()is whatsetup.pysees when it writesstep.pickleaftersetup()returns.Fix
In
compass/landice/tests/ensemble_generator/sgh_restart_ensemble/restart_member.py:Delete lines 95–96 (the
self.work_dir = os.path.join(...)assignment at the start ofsetup()).Replace all remaining uses of
self.work_dirandrun_dirinsetup()withself.spinup_run_dir, which was added by PR Fix InPlaceRestartMember overwriting spinup job_script.sh #12 in__init__and already holds the correct path to the original spinup run directory.The resulting
setup()should look like:Also verify that
__init__does NOT contain anyself.work_diroverride (it should not after PR #12 — just confirm and leave it as-is if correct).Also verify that
ensemble_manager.pyusesgetattr(runStep, 'spinup_run_dir', runStep.work_dir)for theos.chdir()call (added by PR #12 — just confirm and leave as-is if correct).Current file state
compass/landice/tests/ensemble_generator/sgh_restart_ensemble/restart_member.pyBlobSha:f13e64c3a82e5f3cc122eed7b8175ab0a556abfbConstraints
fallback=in any configparser callstimeout=in any subprocess callsThe following is the prior conversation context from the user's chat exploration (may be truncated):
User: can you relink the agent session? It's blank
Assistant: The latest PR is #12:
It's still in WIP/draft — the agent is actively working on it. The checklist shows the two tasks (fix
restart_member.pyand fixensemble_manager.py) are pending. You can track progress directly at #12.User: These fixes are still not getting to the root of the problem:
Traceback (most recent call last):
File "/global/cfs/cdirs/fanssie/users/ahager/mambaforge/envs/dev_compass_1.9.0-alpha.2/bin/compass", line 7, in
sys.exit(main())
~~~~^^
File "/global/cfs/cdirs/fanssie/users/ahager/MALIcode/compass/compass/main.py", line 63, in main
commandsargs.command
~~~~~~~~~~~~~~~~~~~~~~^^
File "/global/cfs/cdirs/fanssie/users/ahager/MALIcode/compass/compass/run/serial.py", line 206, in main
run_single_step(args.step_is_subprocess)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^...
This pull request was created from Copilot chat.