Skip to content

Implement CMake build system for turbo-stack#233

Open
hctorres wants to merge 68 commits into
mainfrom
192-task-implement-improved-build-system-within-turbo-stack
Open

Implement CMake build system for turbo-stack#233
hctorres wants to merge 68 commits into
mainfrom
192-task-implement-improved-build-system-within-turbo-stack

Conversation

@hctorres
Copy link
Copy Markdown
Collaborator

@hctorres hctorres commented May 12, 2026

Part of issue-192 cross-repo CMake build system effort

This PR is one of four coordinated changes:

Repo PR Base
FMS TURBO-ESM/FMS#6 dev/turbo
TIM TURBO-ESM/TIM#16 main
MOM6 TURBO-ESM/MOM6#25 dev/turbo
turbo-stack this PR main

Recommended merge order: FMS → TIM → MOM6 → turbo-stack. This PR consumes the other three via find_package(FMS|TIM)
and add_subdirectory(${MOM6_SOURCE_DIR} mom6_build).

Summary

  • Introduces a CMake-driven build system replacing the legacy mkmf flow. Entry points are documented in CLAUDE.md and scripts/README.md.
  • Top-level CMakeLists.txt selects FMS2 or TIM behind a single TURBO::infra_r8 interface so downstream targets are backend-agnostic; NetCDF::NetCDF_C is attached explicitly to fix link-time failures on split lib/lib64 NetCDF installs.
  • cmake/TurboCompilerFlags.cmake sets Fortran precision and optimization flags for GNU, Intel/IntelLLVM, NVHPC, and Flang compilers.
  • cmake/add_mom_test.cmake provides add_mom_test() / add_mom_tests() wrappers for pFUnit CTest integration (40 MPI-aware unit tests, configured for MAX_PES=4).
  • marbl_build/CMakeLists.txt wires MARBL via MARBL::marbl defined in MOM6's MARBLTargets.cmake.
  • tests-legacy/ is a temporary island that bridges the old build.sh --unit-tests-only flow to the new top-level CMake test build. Slated for removal once build.sh itself is retired.
  • CI: .github/workflows/amrex-mini-app-unit-tests.yaml now wipes any leftover spack/ dir before cloning (gha-runner-turbo workspaces persist between jobs).
  • Two-step pipeline: a per-machine env script under scripts/setup_environment/ (sourced) sets up the toolchain and any from-source deps, then scripts/build_turbo_stack.sh (exec'd) runs cmake configure / build / ctest.
  • Top-level orchestrators: scripts/build_with_spack.sh (spack flavor) and scripts/build_on_derecho.sh (Derecho module flavor) wrap the two-step pipeline for one-command builds.
  • scripts/build_dep.sh is a sourced library defining build_dep <name> ... -- [cmake args] for out-of-spack deps (FMS / TIM / pFUnit / AMReX). Sentinel-based skip cache avoids redundant rebuilds when sources and cmake args haven't changed. Supports branch, tag, and commit SHA refs.

Environment

Two supported environments:

  • Derecho using modulesscripts/build_on_derecho.sh (or source scripts/setup_environment/derecho_cpu_gcc_openmpi.sh).
    Plus test_new_build_system_on_derecho.sh for batch verification of both flavors and both backends.

  • Spackscripts/build_with_spack.sh (or source scripts/setup_environment/spack_local_environment.sh then run
    scripts/build_turbo_stack.sh). Local laptop emulation of derecho modules using spack is also available (scripts/setup_environment/emulate_derecho_modules_locally_with_spack.sh) for iterating on the module-flavor code path without Derecho access.

Test plan

  • All 5 workflows green on the latest push (after k8s pod-init transient reruns on the self-hosted runner):
    • Run TIM unit tests
    • AMReX Mini-App Unit Tests
    • Build MOM6 with TIM and FMS
    • Build Inside NCAR HPC Development Containers
    • MOM code coverage report
  • On Derecho: qsub test_new_build_system_on_derecho.sh builds turbo-stack with both TIM and FMS2 backends.
  • Local laptop builds via spack environments.

hctorres and others added 28 commits April 13, 2026 17:36
CMake:
- TurboOptions: remove TURBO_INFRA/TURBO_MEMORY_MODE; now sourced from
  MOM6Options.cmake as the canonical definition for both repos.
- CMakeLists.txt: resolve MOM6_SOURCE_DIR early and include MOM6Options
  before TurboCompilerFlags so option declarations are never duplicated.
- TurboCompilerFlags: fix dead Clang branch (CMake never assigns "Clang"
  as Fortran compiler ID) to Flang|LLVMFlang; remove unreachable set()
  after FATAL_ERROR.
- add_mom_test: guard empty SUITE_LINK_LIBRARIES to avoid passing bare
  LINK_LIBRARIES keyword; guard BASE_MOM_PFUNIT_INFRA so an empty
  OTHER_SOURCES is never passed to add_pfunit_ctest.
- marbl_build: use CMAKE_CURRENT_SOURCE_DIR instead of CMAKE_SOURCE_DIR
  so the path stays valid if the directory is used outside this project.
- find_package(TIM REQUIRED COMPONENTS R8) to match updated TIMConfig.

Build scripts:
- set -eo pipefail in build.sh, build_turbo_stack.sh,
  create_spack_environment.sh.
- build_turbo_stack.sh: add TURBO_STACK_ROOT guard; set CMAKE_BUILD_TYPE
  to Release by default (Debug only when --debug); propagate build type
  to TIM pre-build; use rm -rf on TIM build dir for clean/stale-cache
  cases instead of --fresh (which leaves generated .cmake files behind);
  pass TIM_DIR to the config-file directory, not the build root; fix
  comment typo and trailing whitespace.
- create_spack_environment.sh: remove stale #--fresh --force comment.
- Trailing newlines added to build.sh and build_turbo_stack.sh.

CI:
- build-tests.yaml: note that ./build.sh is the legacy mkmf build, not
  scripts/build.sh.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fault generator, stale comment updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 12, 2026 07:19
@hctorres hctorres linked an issue May 12, 2026 that may be closed by this pull request
1 task
hctorres and others added 20 commits May 20, 2026 18:36
Five fixes to test_new_build_system_on_derecho.sh -- two correctness
bugs, one fragility, two stale comments / drift:

1. --clean was silently broken.  The inline comment "# where deps build
   from submodules go" sat after a `\` line-continuation, which is NOT
   valid bash: `\<space>` escapes the space (literal arg), the `#`
   starts a comment terminating the line, and the next line's
   "$log_dir" becomes an orphan statement that bash tries to execute.
   Effects: $log_dir was never removed by --clean, and the script
   aborted on the orphan line before continuing.  Hoist the comment
   above the rm.

2. ${TURBO_BUILD_SYSTEM_TEST_DIR:=$TMPDIR/...} fails under `set -u`
   when $TMPDIR is unset (common on bare ssh shells, containers, some
   non-PBS Linux setups).  Derecho's PBS sets it so the script "worked
   on Derecho"; elsewhere it died at line 79 before doing anything.
   Use ${TMPDIR:-/tmp} as a fallback.

3. Renamed the default path component to turbo_build_system_test so it
   matches the variable name (was turbo_build_pr_tester from before
   the rename to TURBO_BUILD_SYSTEM_TEST_DIR).

4. Updated the --clean safety-guard comment that still referenced
   "$HOME defaults" -- default is under $TMPDIR now.

5. The TIM build block reassigned the global $build_dir while the
   FMS2 block correctly used a local-style $fms_build_dir.  Mirror the
   FMS2 pattern with $tim_build_dir.  Not a current correctness issue
   (nothing downstream reads $build_dir), but a footgun if anyone adds
   another build later.

Verified locally:
  - bash -n: syntax clean
  - $TMPDIR unset: script now reaches TURBO_STACK_ROOT validation
    instead of dying at the := expansion
  - --clean against a sandbox: both $log_dir and $TURBO_STACK_ROOT/deps
    now get cleaned in one rm -rf, and the script proceeds past
    --clean to the clone step (previously aborted)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First of three commits implementing the deps-script refactor (plan in
/home/lalo/.claude/plans/toasty-dancing-teacup.md).

scripts/build_dep.sh defines a `build_dep` function -- one call per dep,
with three source-discovery modes:
  1. --source PATH                       explicit override
  2. --clone --url U --ref R             clone into --clone-dest
     --clone-dest DIR                    (also exports <NAME>_ROOT)
  3. $<NAME>_ROOT env var                set externally (e.g. by
                                         fetch_source.sh)
  4. Submodule fallback                  per-name table inside script
Caller passes per-call cmake args after a `--` separator (matches the
pattern build_turbo_stack.sh already uses).

Sentinel expanded beyond the old SHA-only format -- now a KV file with
sha + source path + install prefix + sha256 of sorted cmake args.
Skip-on-rerun fires only when all four match.  Catches the case where
an AMReX rebuild flips -DAMReX_GPU_BACKEND=CUDA on or off, which the
old sha-only sentinel would have silently ignored.  Atomic write via
temp file + mv so a SIGINT mid-write can't leave a half-baked sentinel.

Side effects preserved from the old script: appends $install_prefix to
CMAKE_PREFIX_PATH (with dedup guard); for name=pfunit also exports
PFUNIT_DIR via the versioned cmake-dir glob.  Log phrasing preserved
("[build_dep] $name @ ${sha:0:8} already installed -- skipping" etc.)
for CI-log grep continuity.

scripts/fetch_source.sh defines a `fetch_source` function -- narrowly
scoped to source-only-consumed deps (MOM6, MARBL) that turbo-stack's
own CMakeLists.txt reads via <NAME>_ROOT but does not build.  Same
clone-or-update logic as test_new_build_system_on_derecho.sh's inline
_clone_or_update helper, lifted into a reusable library.  --url is
required (no default): hardcoded name-to-URL map would duplicate
.gitmodules and silently rot when forks move.

No callers touched in this commit.  build_dependencies_from_source.sh
still in use; bash -n green on both new files; build_with_spack.sh
--parallel 32 still passes 40/40 ctest.  Trivially reversible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 2 of 3 in the deps-script refactor (plan in
/home/lalo/.claude/plans/toasty-dancing-teacup.md).  build_dep.sh and
fetch_source.sh were added in e08c29c with no callers touched.  This
commit moves every caller off the old monolithic deps script and onto
the new per-dep model.

Files migrated:
  - scripts/setup_environment/derecho_cpu_gcc_openmpi.sh: grew from
    7 lines (just module loads) to ~50 lines.  After module loads, sources
    build_dep.sh and calls build_dep four times -- one for FMS, pFUnit,
    AMReX, TIM -- with the cmake flags previously hardcoded inside
    build_dependencies_from_source.sh.  Forwards --parallel via the
    TURBO_DEP_PARALLEL env var.  Future per-machine env scripts (e.g.
    derecho_gpu_nvhpc_openmpi.sh) will differ from this only by adding
    -DAMReX_GPU_BACKEND=CUDA after `--` in the AMReX call.

  - scripts/setup_environment/emulate_derecho_modules_locally_with_spack.sh:
    same shape after the spack-env activation -- four build_dep calls.

  - scripts/build_on_derecho.sh: dropped the source-build_dependencies_from_source.sh
    block.  Sets TURBO_DEP_PARALLEL from --parallel, then sources the env
    script which now handles deps.

  - scripts/build_with_spack.sh: replaced the source-build_dependencies_from_source.sh
    --only tim block with a single inline build_dep tim call (still gated on
    --infra TIM, since that conditional is orchestrator-level).

  - test_new_build_system_on_derecho.sh: replaced the inline _clone_or_update
    helper (12 lines) with `source fetch_source.sh` and three fetch_source
    calls for MOM6/TIM/FMS overrides.  Identical behavior, less duplication.

build_dependencies_from_source.sh is now orphaned but still present;
commit 3 will delete it.

Verified locally:
  - bash -n green on all 7 touched scripts.
  - build_with_spack.sh --parallel 32: 40/40 ctest.
  - build_with_spack.sh --infra TIM --parallel 32: 40/40 ctest.
  - Sentinel-skip rerun: emits
    `[build_dep] tim @ ae308ea8 already installed -- skipping`
    (preserved phrasing from the old `[build_dependencies_from_source]`
    log lines, just renamed bracket tag).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 3 of 3 in the deps-script refactor (plan in
/home/lalo/.claude/plans/toasty-dancing-teacup.md).  build_dep.sh +
fetch_source.sh added in e08c29c; callers migrated in 6e5f2cd.  Now
delete the orphaned old script and reconcile docs/comments.

Deleted:
  - scripts/build_dependencies_from_source.sh (304 lines, fully
    subsumed by scripts/build_dep.sh).

Doc + comment updates:
  - scripts/README.md: full rewrite of the layout, workflow, and
    dependency-resolution sections.  Pipeline narrowed from 3 steps to 2
    (env scripts now own the build_dep call list).  New sections
    documenting the build_dep + fetch_source function signatures and
    the four-mode source resolution.
  - CLAUDE.md: script-structure table updated to list build_dep.sh and
    fetch_source.sh; component-relationships block expanded with the
    Derecho orchestrator path; source-override example updated to show
    multi-dep overrides via env vars.
  - scripts/build_turbo_stack.sh: header comment pointer updated from
    build_dependencies_from_source.sh to build_dep.sh.
  - CMakeLists.txt: two TURBO_INFRA comments updated.
  - spack/derecho_modules_emulation_with_spack.yaml: header comment
    updated.
  - test_new_build_system_on_derecho.sh: in-script comments referencing
    the old deps script + _clone_or_update helper updated.

Verified locally:
  - `git ls-files | xargs grep` finds no remaining references to
    `build_dependencies_from_source` or `_clone_or_update` in tracked
    files.
  - build_with_spack.sh --infra TIM --parallel 32: 40/40 ctest.
  - build_with_spack.sh --parallel 32 (separately, FMS2 path): also
    40/40 in earlier verification.

Note: setup_env_separation_plan.md is untracked and still contains the
old script references in its historical narrative.  Out of scope for
this commit since it's not in the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The boolean flags in test_new_build_system_on_derecho.sh were named
override_MOM6_submodule, override_TIM_submodule, override_FMS_submodule.
The naming was misleading: it implied "this flag controls whether we
override the submodule", but after the build_dep refactor that's no
longer true.  $<NAME>_ROOT exported in the calling environment also
overrides the submodule (via build_dep's source-resolution precedence
layer), and the script does nothing to enable or disable that path.

What the flags actually control is whether THIS SCRIPT does a fresh
clone via fetch_source.  Rename to fetch_MOM6 / fetch_TIM / fetch_FMS
to match the fetch_source function being called, and expand the
surrounding comment to spell out the two ways a caller can override a
submodule:

  1. Leave fetch_<NAME>=true (default) -- the script clones the
     PR branch from the configured URL and exports <NAME>_ROOT.
  2. Set fetch_<NAME>=false AND export <NAME>_ROOT before launching --
     the script doesn't touch the clone dir; downstream build_dep
     picks up the caller's <NAME>_ROOT.

The deps_that_override_submodules_dir directory name is unchanged --
it still genuinely holds submodule overrides.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit renamed override_*_submodule -> fetch_* to better
describe what the flags actually do.  But the flags were direct
assignments (`fetch_MOM6=true`), which meant flipping one off required
editing the script -- the previous commit's documentation acknowledged
this with an awkward "(after editing)" caveat.

Promote each flag to `: "${fetch_<NAME>:=true}"` so callers can flip
them off via the environment without touching the file:

  fetch_MOM6=false MOM6_ROOT=$HOME/dev/MOM6 \
      ./test_new_build_system_on_derecho.sh

Pair an exported <NAME>_ROOT with fetch_<NAME>=false to test against a
local dev tree: the script stays out of the override dir, build_dep
(called from the env script) picks up the env-var path via its own
precedence layer.

Expand the header Configuration: block to list MOM6_ROOT / TIM_ROOT /
FMS_ROOT and the fetch_<NAME> flags with concrete examples.  Trim the
now-redundant inline comment block above the flag definitions; the
workflow lives in the header.

Verified: `fetch_MOM6=false bash -c '...'` correctly leaves the flag
at "false" while the := default fires when the var is unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this commit, the per-machine env scripts and build_with_spack.sh
hardcoded the dep cmake build + install root as
$TURBO_STACK_ROOT/deps/default.  That put 10+ GB of generated artifacts
inside the source-controlled checkout, requiring the test driver's
--clean block to reach back into $TURBO_STACK_ROOT/deps to wipe them
between runs.  The pattern contradicted the earlier goal of keeping
all build artifacts under $TURBO_BUILD_SYSTEM_TEST_DIR.

Make the dep root configurable via the TURBO_DEPS_ROOT env var:

  - scripts/setup_environment/derecho_cpu_gcc_openmpi.sh:
    `: "${TURBO_DEPS_ROOT:=$TURBO_STACK_ROOT/deps/default}"` before
    computing _install_prefix / _build_root.
  - scripts/setup_environment/emulate_derecho_modules_locally_with_spack.sh:
    same.
  - scripts/build_with_spack.sh: same `:=` default at the top of the
    inline `build_dep tim` block.

  - test_new_build_system_on_derecho.sh: exports
    TURBO_DEPS_ROOT="$TURBO_BUILD_SYSTEM_TEST_DIR/turbo-stack-deps"
    so the env script's build_dep calls land their builds + installs
    under the test dir alongside everything else this driver creates.
    The --clean block drops the `$TURBO_STACK_ROOT/deps` line and
    nukes $TURBO_DEPS_ROOT instead.  Header docstring updated to spell
    out that all artifacts now live under $TURBO_BUILD_SYSTEM_TEST_DIR.

Behavior preserved for callers who don't set TURBO_DEPS_ROOT: the `:=`
default still resolves to $TURBO_STACK_ROOT/deps/default, matching
what the scripts did before.  Existing build_with_spack.sh users (no
test driver) see no change.

Verified locally:
  - bash -n green on all four touched scripts.
  - build_with_spack.sh --infra TIM --parallel 32 (default TURBO_DEPS_ROOT):
    40/40 ctest passes; artifacts at $TURBO_STACK_ROOT/deps/default/.
  - env TURBO_DEPS_ROOT=/tmp/test_deps_root_override
    bash scripts/build_with_spack.sh --infra TIM --parallel 32:
    40/40 ctest passes; artifacts land at /tmp/test_deps_root_override/
    (no $TURBO_STACK_ROOT/deps/ pollution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit on this file moved TURBO_DEPS_ROOT from a single
global export ($TURBO_BUILD_SYSTEM_TEST_DIR/turbo-stack-deps) to two
per-flavor exports (one under each $build_dir/turbo-stack-with-*/deps).
The original global export and its accompanying mkdir + --clean
references were left in place, creating dead code:

  - The global `export TURBO_DEPS_ROOT="$TURBO_BUILD_SYSTEM_TEST_DIR/turbo-stack-deps"`
    was overwritten by both per-flavor blocks before the env script
    ever read it.  No dep ever landed at the original path.
  - The mkdir created an empty unused directory there.
  - The --clean rm list referenced the stale value.
  - The block comment claimed TURBO_DEPS_ROOT was "exported below so
    the env script picks it up" -- accurate for the prior design but
    misleading after the per-flavor switch.

Cleanup:
  - Drop the global export.
  - Replace the block comment with one that explains the per-flavor
    isolation pattern.
  - Drop $TURBO_DEPS_ROOT from the mkdir line and the --clean rm list.
    Per-flavor deps live at $build_dir/turbo-stack-with-*/deps, so
    wiping $build_dir already wipes the from-source dep installs --
    documented inline in the --clean block for the next reader.

No behavior change for normal runs (the per-flavor exports already
won).  --clean now does fewer redundant rm-rf invocations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The old parallel-forwarding mechanism was clunky:
  - Orchestrators parsed --parallel N, then exported a custom
    TURBO_DEP_PARALLEL env var for one consumer (env script's
    build_dep calls) AND passed --parallel N as a CLI arg to another
    (build_turbo_stack.sh).
  - Env scripts had to detect TURBO_DEP_PARALLEL, build a
    _parallel_args=() array, and splice "${_parallel_args[@]}" into
    each of four build_dep calls.
  - Leaf tools (build_dep, build_turbo_stack) always passed --parallel
    to cmake --build with a default of 1.

CMake already supports CMAKE_BUILD_PARALLEL_LEVEL as a native env var:
when --parallel isn't given to `cmake --build`, cmake reads
CMAKE_BUILD_PARALLEL_LEVEL as its own default.  Adopt that mechanism:

Orchestrators (build_on_derecho.sh, build_with_spack.sh, test driver):
  - When --parallel N is given, `export CMAKE_BUILD_PARALLEL_LEVEL=N`
    once.  No more TURBO_DEP_PARALLEL.  No more --parallel flag
    forwarding to child scripts.

Env scripts (derecho_cpu_gcc_openmpi.sh, emulate_*):
  - Drop the _parallel_args=() array and the TURBO_DEP_PARALLEL
    detection.  Each build_dep call shrinks by one line.

Leaf tools (build_dep.sh, build_turbo_stack.sh):
  - Default changes from _parallel="1" to _parallel="".  When unset,
    don't pass --parallel to cmake -- cmake's native env-var fallback
    handles it (CMAKE_BUILD_PARALLEL_LEVEL if set, generator default
    otherwise).  --parallel N on the CLI still works as a per-call
    override.

Behavior change to note:
  - No --parallel anywhere + no CMAKE_BUILD_PARALLEL_LEVEL set +
    Ninja generator (--ninja): now uses nproc (Ninja's native default)
    instead of 1 (the old explicit cap).  User accepted this.
  - Make-with-no-overrides behavior unchanged (still 1).
  - With --parallel N at the orchestrator: identical to before, but
    the value flows through CMAKE_BUILD_PARALLEL_LEVEL rather than
    being plumbed.
  - CMAKE_BUILD_PARALLEL_LEVEL=N exported externally (in shell
    profile, qsub directive, CI config) now works without any wrapper
    --parallel flag.

Verified locally:
  - bash -n green on all 7 touched scripts.
  - bash scripts/build_with_spack.sh --parallel 32: 40/40 ctest.
    bash -x confirms CMAKE_BUILD_PARALLEL_LEVEL=32 exported.
  - bash scripts/build_with_spack.sh --infra TIM --parallel 32:
    40/40 ctest.
  - env CMAKE_BUILD_PARALLEL_LEVEL=16 bash scripts/build_with_spack.sh
    --infra TIM: 40/40 ctest (external env var propagates without
    the orchestrator's --parallel flag).
  - grep across all tracked files confirms no TURBO_DEP_PARALLEL or
    _parallel_args references remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…--build_dir.

Bundles together the deps-build-root simplification + pre-PR doc refresh.

CLI surface simplification:
  The previous design exposed --deps-build-root as a CLI flag on both
  build_on_derecho.sh and build_with_spack.sh, with three-level
  precedence (CLI flag > env var > --build_dir derivation > default).
  In practice the only common use cases are "deps next to my build"
  and "deps in the default location" -- the explicit flag adds CLI
  surface for a niche use case.

  Drop the flag from both orchestrators.  Deps location now derives
  from --build_dir directly: $build_dir/deps when given, else
  $TURBO_STACK_ROOT/deps/default.  Power users who genuinely need
  deps in a path unrelated to the turbo-stack build_dir can source
  the env script directly with its --deps-build-root sourced flag,
  which is still supported.

Strict argument parsers:
  Both orchestrators previously had `*) break ;;` to silently accept
  unknown options.  Combined with bash's source-inherits-$@ behavior,
  this leaked stray args through to sourced env scripts -- e.g.
  build_with_spack.sh --deps-build-root /tmp/x errored inside
  spack_local_environment.sh, not at the orchestrator boundary.

  Switch both to strict error-on-unknown, matching build_turbo_stack.sh
  after f43457f.  Typos now fail loudly with a clear error at the
  orchestrator level.

Env-script parameter passing:
  Replace the env-var (TURBO_DEPS_BUILD_ROOT) plumbing with sourced
  args: orchestrators compute the resolved path and pass it via
  `source env.sh --deps-build-root "$path"`.  Env scripts parse `$@`
  for their own --deps-build-root flag and use a local
  $_deps_build_root variable; no exported state.  Defaults to
  $TURBO_STACK_ROOT/deps/default when neither orchestrator nor caller
  provides a value.

Test driver:
  Comment block updated to describe the new mechanism ($build_dir
  passed to build_on_derecho.sh auto-derives deps location).  No
  behavioral change -- the test driver already passes --build_dir, so
  the auto-derivation triggers the same per-flavor isolation as
  before.

Doc refresh:
  scripts/README.md: fix the "3-step pipeline" header (was 2 steps);
  add "Where do dep builds + installs land?" section explaining the
  --build_dir auto-derive and power-user --deps-build-root path; add
  "Parallel build jobs" section documenting CMAKE_BUILD_PARALLEL_LEVEL
  propagation; drop the reference to setup_env_separation_plan.md
  (that doc was local-only design scratch and is not part of the PR).
  CLAUDE.md: same pattern -- drop the plan-doc reference, add brief
  paragraphs on deps location and parallelism plumbing.

Verified locally:
  - bash -n green on all 7 touched scripts.
  - build_with_spack.sh --parallel 32: 40/40 ctest (default path).
  - build_with_spack.sh --infra TIM --parallel 32 --build_dir /tmp/X:
    40/40 ctest, deps land at /tmp/X/deps/install (auto-derived).
  - build_with_spack.sh --infra TIM --parallel 32: 40/40 ctest
    (default deps path = $TURBO_STACK_ROOT/deps/default).
  - build_with_spack.sh --foo bar: exits 1 with "unknown option" at
    the orchestrator's parser (no more silent swallow).
  - grep across tracked files: no TURBO_DEPS_(ROOT|BUILD_ROOT)
    references; no setup_env_separation_plan references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the test driver invoked build_on_derecho.sh once per flavor.
Each invocation sourced derecho_cpu_gcc_openmpi.sh, which derived its
deps location from the per-flavor --build_dir and rebuilt FMS / pFUnit
/ AMReX / TIM into a flavor-private install tree.  Net result: AMReX
alone got built twice per run, ~5-10 min wasted on Derecho per second
flavor.

Source the env script ONCE up front instead, with an explicit
--deps-build-root pointing at a shared location
($TURBO_BUILD_SYSTEM_TEST_DIR/shared-deps).  Deps build + install once;
the env script appends the install prefix to CMAKE_PREFIX_PATH; both
per-flavor build_turbo_stack.sh calls find the shared install
naturally.  This is safe because the dep cmake args don't depend on
TURBO_INFRA -- FMS / pFUnit / AMReX / TIM build the same regardless of
which infra turbo-stack itself will use.

Replace the per-flavor build_on_derecho.sh invocations with direct
build_turbo_stack.sh invocations.  The test driver now plays the
orchestrator role itself (env source + N turbo-stack builds), which is
the right shape for a multi-build harness -- build_on_derecho.sh
remains the simple "one-call-does-everything" path for individual
dev runs.

The env-script source happens with set -e active, so a module-load
failure or dep-build failure bails the whole test driver cleanly.
The per-flavor turbo-stack builds remain under set +e with
${PIPESTATUS[0]} capture, so a failure in one flavor still lets the
other run and the summary reports both verdicts.

--clean now wipes $shared_deps_dir explicitly (previously it was
covered by wiping $build_dir, but the deps are no longer under
$build_dir).

Header docstring updated:
  - --parallel description now says "exported as CMAKE_BUILD_PARALLEL_LEVEL"
    (no longer accurate to say "forwarded to build_on_derecho.sh" since
    we don't invoke it anymore).

Layout under $TURBO_BUILD_SYSTEM_TEST_DIR after this change:
  deps_that_override_submodules/   # MOM6 / TIM / FMS source clones
  shared-deps/                     # built-once FMS/pFUnit/AMReX/TIM
    build/<name>/
    install/
  turbo-stack-build/               # per-flavor turbo-stack builds
    turbo-stack-with-FMS2/
    turbo-stack-with-TIM/
  logs/turbo-stack-with-{FMS2,TIM}.log

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…2's.

Previously the top-level CMakeLists trusted that FMS2's and TIM's
FindNetCDF.cmake copies were identical and put only FMS2's on
CMAKE_MODULE_PATH. Move the append into each TURBO_INFRA branch so the
chosen backend supplies its own copy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_dep's CMAKE_PREFIX_PATH / PFUNIT_DIR exports lived only on the
rebuild path, so a re-source after the first successful build (when
the sentinel already matches) returned without exposing the install
and downstream find_package() calls fell through to whatever the
enclosing environment supplied. Extract the side-effect block into
_build_dep_expose_install and call it on the "already installed --
skipping" path as well.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
turbo-stack tracks features ahead of the released FMS package (e.g.
the omp_offload-aware mpp_do_group_update on FMS dev/turbo), so
linking against spack's FMS quietly used a stale version. Treat FMS
like TIM: remove it from spack/spack.yaml and have build_with_spack.sh
call build_dep fms unconditionally so the build always reflects
\$FMS_ROOT (or the submodule fallback). CLAUDE.md updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build.sh --unit-tests-only runs `make -C tests build_unit_tests`, but the
CMake migration (a30ac26) removed tests/Makefile and rewrote tests/CMakeLists.txt
to depend on the top-level project (TURBO::infra_r8, MOM6::infra, MOM6::framework),
so tests/ can no longer be configured standalone — breaking CI's
"Run TIM unit tests" workflow.

Adds tests-legacy/, a self-contained island that mirrors the pre-a30ac26
test build: its own Makefile + CMakeLists.txt + cmake/{add_mom_test,
find_mom_dependencies}.cmake, plus an interface/ subtree using the legacy
positional-args convention. Links against MOM::Interface synthesized
directly from the prebuilt lib${INFRA}.a and libinfra-${INFRA}.a artifacts
that build.sh's mkmf flow produces. The .pf sources and config files stay
under tests/ — only the cmake harness is duplicated, and it references
the real sources via ${LEGACY_TESTS_DIR}/... absolute paths.

build.sh changes one line: UNIT_TEST_ROOT=tests -> tests-legacy. Top-level
cmake never sees tests-legacy/, so the modern flow is untouched.

Temporary — slated for removal once build.sh is retired in favor of
scripts/build_turbo_stack.sh (the top-level cmake flow).
5 of 6 matrix entries in the AMReX Mini-App workflow fail with
`fatal: destination path 'spack' already exists and is not an empty directory`.
The gha-runner-turbo self-hosted runner workspaces persist between jobs,
and actions/checkout's clean doesn't always remove untracked dirs like
spack/, so the previous run's clone is still sitting in the workspace.

Prepend `rm -rf spack` to the Install spack step so we always start from
a clean state. Shallow clone is cheap (--depth=2, ~50MB) so the cost is
trivial compared to predictability.

The 1 entry that fails earlier (gcc14/openmpi, "Pod ... unhealthy with
phase status Failed") is a transient k8s pod-init issue — the image was
pulled at the same timestamp, so the image itself is fine. Not addressed
here.
Two HIGH findings from the pre-PR review:

1. scripts/build_dep.sh and scripts/fetch_source.sh document --ref / --branch
   as "branch, tag, or commit", but the implementations called
   `git clone --branch "$ref"` (rejects raw SHAs) and `git checkout -B "$ref"
   "origin/$ref"` (raw SHAs can't be prefixed with origin/). Calling with a
   commit SHA — encouraged by the docs — failed.

   Detect SHA-shaped refs (7–40 hex chars) and take a detached-checkout
   path: plain clone + `git checkout --detach <sha>` on first clone,
   `git fetch origin && git checkout --detach <sha>` on update.

2. test_new_build_system_on_derecho.sh runs under `set -u` and dereferences
   $MOM6_ROOT / $TIM_ROOT / $FMS_ROOT unconditionally in the version
   banner. The script header documents a `fetch_<NAME>=false <NAME>_ROOT=...`
   workflow, but if any one of those *_ROOT vars wasn't exported the script
   died with "unbound variable" before any build started. Use ${VAR:-} and
   have _print_version print "<not set>" when its arg is empty.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 36 changed files in this pull request and generated 4 comments.

Comment thread cmake/add_mom_test.cmake
Comment thread scripts/build_dep.sh Outdated
Comment thread scripts/fetch_source.sh Outdated
Comment thread scripts/setup_environment/derecho_cpu_gcc_openmpi.sh Outdated
Comment thread scripts/build_turbo_stack.sh Outdated
Comment thread scripts/build_turbo_stack.sh Outdated
- build_dep.sh / fetch_source.sh: branch/tag/SHA dispatch after
  `fetch --tags` so reruns work for tag refs (not just branches).
- build_turbo_stack.sh: gate ctest on TURBO_BUILD_UNIT_TESTS read
  from CMakeCache.txt; drop the dead #cmake_generate_options lines.
- add_mom_test.cmake: FATAL_ERROR on positional args to add_mom_tests
  so silent zero-test registrations are loud; fix the
  MOM_coms_helpers caller to use TEST_FILES + LINK_LIBRARIES.
- derecho_cpu_gcc_openmpi.sh: safe LIBRARY_PATH expansion under
  the test driver's `set -u`; enable -DAMReX_TINY_PROFILE=ON for
  the from-source AMReX build (same flag added to the laptop
  emulation env).
- create_spack_environment.sh: point the comment at spack.yaml,
  the file that actually exists.
- CLAUDE.md / scripts/README.md: enumerate build_with_spack.sh
  options; describe fetch_source's branch/tag/SHA semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@johnmauff johnmauff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hctorres, thanks for the PR, it looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task]: Implement improved build system within turbo-stack

3 participants