Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions .github/workflows/cime_machine_config_update.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: Daily config_machines update check

on:
schedule:
- cron: '0 8 * * *'
workflow_dispatch:

env:
PIXI_ENV: py314
ISSUE_TITLE: Daily config_machines drift detected
PRIMARY_ASSIGNEE: xylar
REPORT_JSON: cime_machine_config_report.json
REPORT_MARKDOWN: cime_machine_config_report.md

jobs:
check-config-machines:
runs-on: ubuntu-latest
permissions:
contents: read
steps:
- name: Checkout main
uses: actions/checkout@v6
with:
ref: main

- name: Set up Pixi
uses: prefix-dev/setup-pixi@v0.9.5
with:
pixi-version: v0.62.2
cache: ${{ hashFiles('pixi.lock') != '' }}
environments: ${{ env.PIXI_ENV }}

- name: Install mache from main
run: |
pixi run -e ${PIXI_ENV} python -m pip install --no-deps \
--no-build-isolation -e .

- name: Generate machine update report
env:
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
pixi run -e ${PIXI_ENV} python utils/update_cime_machine_config.py \
--json-output ${REPORT_JSON} \
--markdown-output ${REPORT_MARKDOWN} \
--run-url ${RUN_URL}

- name: Upload machine update report
uses: actions/upload-artifact@v4
with:
name: cime-machine-config-report
path: |
${{ env.REPORT_JSON }}
${{ env.REPORT_MARKDOWN }}

- name: Synchronize automation issue
env:
GH_CLI_TOKEN: ${{ secrets.GH_CLI_TOKEN }}
DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
run: |
if [ -z "${GH_CLI_TOKEN}" ]; then
echo "GH_CLI_TOKEN is not configured; report generated" \
" but no issue was synchronized."
exit 0
fi

pixi run -e ${PIXI_ENV} python \
utils/manage_cime_machine_config_issue.py \
--report-json ${REPORT_JSON} \
--report-markdown ${REPORT_MARKDOWN} \
--repository ${GITHUB_REPOSITORY} \
--token ${GH_CLI_TOKEN} \
--issue-title "${ISSUE_TITLE}" \
--base-branch ${DEFAULT_BRANCH} \
--primary-assignee ${PRIMARY_ASSIGNEE}
41 changes: 41 additions & 0 deletions .github/workflows/copilot-setup-steps.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Copilot Setup Steps

on:
workflow_dispatch:
push:
paths:
- .github/workflows/copilot-setup-steps.yml
- pixi.toml
- pyproject.toml
pull_request:
paths:
- .github/workflows/copilot-setup-steps.yml
- pixi.toml
- pyproject.toml

jobs:
copilot-setup-steps:
runs-on: ubuntu-latest
permissions:
contents: read
timeout-minutes: 20
steps:
- name: Checkout code
uses: actions/checkout@v6

- name: Set up Pixi
uses: prefix-dev/setup-pixi@v0.9.5
with:
pixi-version: v0.62.2
cache: ${{ hashFiles('pixi.lock') != '' }}
environments: py314

- name: Install mache in the Pixi environment
run: |
pixi run -e py314 python -m pip install --no-deps \
--no-build-isolation -e .

- name: Verify the agent environment
run: |
pixi run -e py314 python --version
pixi run -e py314 mache --help
4 changes: 4 additions & 0 deletions docs/developers_guide/adding_new_machine.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ can be added to mache. This list is a *copy* of the
which we try to keep up-to-date. If you wish to add a machine that is not
included in this list, you must contact the E3SM-Project developers to add your
machine.

For details on the automated workflow that detects upstream drift in this file
and assigns follow-up work to Copilot, see
{doc}`config_machines_updates`.
:::

(dev-new-config-file)=
Expand Down
246 changes: 246 additions & 0 deletions docs/developers_guide/config_machines_updates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
# Automated `config_machines.xml` updates

This page describes the automation that watches for upstream changes to
E3SM's `config_machines.xml`, opens or refreshes a Copilot task when drift is
detected, and explains how maintainers are expected to review the resulting
pull request.

## Goal

`mache` keeps a repository-local copy of the upstream E3SM machine list in
`mache/cime_machine_config/config_machines.xml`.

The automation added here does **not** edit that file directly. Instead, it:

1. Compares the copy in `mache` against the current upstream E3SM source.
2. Produces a structured report describing any drift for supported machines.
3. Creates or updates one GitHub issue that assigns the work to Copilot.
4. Lets Copilot open a PR that updates `config_machines.xml` and any related
Spack configuration.

This keeps the source-of-truth update in a reviewed pull request rather than a
silent CI-side commit.

## Pieces of the automation

### Daily workflow

`.github/workflows/cime_machine_config_update.yml`
: Runs once a day at `0 8 * * *` and can also be started manually with
`workflow_dispatch`.

The job:

1. Checks out `main`.
2. Sets up the `py314` Pixi environment.
3. Installs `mache` from the checked-out repository.
4. Runs `utils/update_cime_machine_config.py`.
5. Uploads the generated JSON and Markdown report artifacts.
6. Runs `utils/manage_cime_machine_config_issue.py` when `GH_CLI_TOKEN` is
configured.

### Copilot environment workflow

`.github/workflows/copilot-setup-steps.yml`
: Defines the setup steps the Copilot cloud agent can use on the default
branch so it starts from a working Pixi environment with `mache` installed.

### Drift report builder

`utils/update_cime_machine_config.py`
: Downloads the current upstream E3SM `config_machines.xml`, compares it with
`mache/cime_machine_config/config_machines.xml`, prints a short console
summary, and optionally writes:

- a JSON report for machine-readable automation,
- a Markdown issue body for Copilot and human reviewers.

`mache/cime_machine_config/report.py`
: Contains the structured comparison logic. It determines which supported
machines changed, identifies module and environment-variable drift, infers
related package groups, and lists candidate Spack template files to review.

### Issue synchronization

`utils/manage_cime_machine_config_issue.py`
: Owns the GitHub-side lifecycle for the automation issue.

If drift exists, it creates or updates the issue.

If no drift exists, it closes the existing issue.

If Copilot assignment fails, it falls back to creating or updating the same
issue without Copilot assignment so the report is still visible.

### Tests

`tests/test_cime_machine_config_report.py`
: Verifies that the report builder detects relevant drift and that the rendered
issue body contains the required maintainer instructions.

## How `config_machines.xml` gets updated

The important point is that the scheduled workflow never edits
`mache/cime_machine_config/config_machines.xml` itself.

The update path is:

1. The workflow detects drift between the `mache` copy and upstream E3SM.
2. The workflow creates or refreshes a GitHub issue.
3. Copilot is assigned to that issue.
4. Copilot opens a pull request against `main`.
5. That PR updates `mache/cime_machine_config/config_machines.xml` first, then
any related Spack templates or version strings that the report indicates
should be reviewed.
6. A maintainer reviews and merges the PR.
7. The next daily run compares the merged repository state against upstream
again.

If the PR fully resolved the drift, the issue is closed automatically on the
next run.

If only part of the drift was resolved, the issue stays open and its body is
updated to reflect the remaining work.

## What Copilot is told to do

Copilot receives instructions from two places.

### Fixed API-level instructions

`utils/manage_cime_machine_config_issue.py` adds the following guidance in the
`agent_assignment` payload:

- Use the issue body as the task definition.
- Update `config_machines.xml` first.
- Then update related Spack templates and version strings.
- Add TODO comments in the PR when prefix or path changes need reviewer
confirmation.

### Generated issue-body instructions

`mache/cime_machine_config/report.py` renders the issue body for the current
drift and includes:

- the timestamp and upstream source URL,
- the workflow run URL,
- the list of affected supported machines,
- the required work list,
- per-machine details such as package groups, prefix or path variables, and
candidate Spack templates to inspect.

The required work section tells Copilot to:

- update `mache/cime_machine_config/config_machines.xml` for the affected
supported machines,
- update Spack templates and version strings when module or environment drift
implies different package versions,
- keep the PR focused when the change is only version or module drift,
- add a TODO in the PR instead of guessing when a new prefix or path is not
obvious.

## Why this does not create a new issue every day

The workflow is designed to reuse one open issue rather than create a new one
for every scheduled run.

`utils/manage_cime_machine_config_issue.py` looks for an existing open issue
with the fixed title stored in the workflow environment:

- `ISSUE_TITLE: Daily config_machines drift detected`

The lifecycle is:

1. If no matching open issue exists and drift is detected, create one.
2. If a matching open issue already exists and drift is still present, update
that same issue.
3. If no drift remains and the issue exists, close it.

That means an unresolved drift while you are away does **not** produce a fresh
issue every day. The same issue remains open and is refreshed in place.

A new issue would only be created if one of these is true:

- the existing automation issue was manually closed while drift still exists,
- the issue title configured in the workflow was changed,
- the existing issue was deleted or otherwise no longer appears as an open
issue in the repository.

## Reviewer workflow

When Copilot opens a PR from this issue, the reviewer should check the changes
in this order.

### 1. `config_machines.xml` changes

Verify that the PR updates
`mache/cime_machine_config/config_machines.xml` only for supported machines
reported by the workflow, and that those changes match the current upstream
E3SM machine definitions.

In practice, the easiest cross-check is to compare the PR against the report
artifact from the workflow run that opened or refreshed the issue.

### 2. Related Spack updates

If the report lists package groups or candidate Spack templates, check that the
PR updated the relevant `mache/spack/*.yaml` inputs and any version strings
that should track the new module or environment values.

If the report does not indicate Spack-relevant drift, the PR should usually be
limited to `config_machines.xml`.

### 3. Ambiguous path or prefix changes

When upstream changes a path-like variable such as `NETCDF_PATH`, the correct
replacement in `mache` may not be obvious from the XML alone.

In that case, the expected behavior is **not** to guess. The PR should leave a
TODO note for the reviewer and explain what needs confirmation.

### 4. Validation

At minimum, reviewers or PR authors should run the same local checks used by
development in this repository.

Generate the current report locally:

```bash
pixi run -e py314 python utils/update_cime_machine_config.py \
--json-output /tmp/cime_machine_config_report.json \
--markdown-output /tmp/cime_machine_config_report.md
```

Run the focused tests:

```bash
pixi run -e py314 pytest tests/test_cime_machine_config_report.py
```

Run pre-commit on changed files before merging:

```bash
pixi run -e py314 pre-commit run --files <changed files>
```

## Manual dry run for maintainers

To exercise the detection path without waiting for the cron schedule:

1. Trigger the workflow manually with `workflow_dispatch`, or
2. Run `utils/update_cime_machine_config.py` locally in the Pixi environment.

If `GH_CLI_TOKEN` is not configured, the workflow still generates and uploads
the report artifacts but skips issue synchronization.

That is a safe way to validate the comparison and report rendering logic
without asking Copilot to act on the result.

## Operational notes

- `GH_CLI_TOKEN` should be a user token with access to create and update
issues in the repository. A classic PAT with `repo` scope is sufficient.
- Copilot assignment additionally depends on Copilot cloud agent being enabled
for the repository.
- The workflow uses the repository's current `main` branch as the comparison
baseline and as the branch Copilot is asked to target.
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ users_guide/sync/diags
developers_guide/quick_start
developers_guide/contributing
developers_guide/deploy
developers_guide/config_machines_updates
developers_guide/adding_new_machine
developers_guide/spack
developers_guide/jigsaw
Expand Down
Loading
Loading