Skip to content

Refactor and extend NetCDF input support for topography and storm surge (met) forcing#701

Open
mandli wants to merge 18 commits into
clawpack:masterfrom
mandli:refactor-netcdf-support
Open

Refactor and extend NetCDF input support for topography and storm surge (met) forcing#701
mandli wants to merge 18 commits into
clawpack:masterfrom
mandli:refactor-netcdf-support

Conversation

@mandli
Copy link
Copy Markdown
Member

@mandli mandli commented Apr 14, 2026

This PR replaces the partial, ad-hoc NetCDF support that existed for topo_type=4 with a complete, robust NetCDF input pipeline covering both topography and gridded met forcing for storm surge including at least partially what is described in #699. The new infrastructure handles CF convention discovery, coordinate normalization, unit enforcement, and descriptor-based dispatch in a way that is consistent across both input types and extensible to future fields (precipitation, friction). New documentation can be found at clawpack/doc#240.

New Python Module netcdf_utils.py

Added a new python module src/python/geoclaw/netcdf_utils.py that provides:

  • NetCDFInterrogator -- base class for CF-aware file inspection: coordinate discovery, lon convention detection ([0,360] vs [-180,180]), lat order detection (N-to-S vs S-to-N), dimension order discovery, fill value resolution (_FillValue vs missing_value with correct CF precedence), and crop bound validation. Operates lazily via xarray -- no data arrays are loaded during interrogation.
  • TopoInterrogator(NetCDFInterrogator) -- adds fill value detection within the crop region (hard error -- silent NaN in bathymetry is a correctness hazard), unit verification and conversion to the GeoClaw contract (meters, positive up).
  • MetInterrogator(NetCDFInterrogator) -- adds multi-variable synchronization checks (wind_u, wind_v, pressure on the same grid and time axis), unit validation and conversion (pressure to Pa, wind to m/s), CF time decoding to seconds from a user-defined offset, and ensemble dimension detection.
  • DescriptorWriter -- serializes interrogator output to the descriptor format consumed by Fortran: inline key=value lines in topo.data for topography, and the body of *.storm files for met forcing.
  • CFNormalizer -- standalone utility for repairing CF metadata (standard_name, axis, units, _FillValue) on existing NetCDF files without resampling or reprojecting. Idempotent.

Modifications

  1. Added a units contract to units.py that has a GEOCLAW_NETCDF_UNITS dictionary defining the units that Fortran expects.
GEOCLAW_NETCDF_UNITS = {
    "topo":     "m",
    "wind_u":   "m/s",
    "wind_v":   "m/s",
    "pressure": "Pa",
    "time":     "s",
    # precipitation, friction: reserved for future implementation
}

Python enforces this contract during interrogation. Fortran assumes it without checking. The contract is documented centrally here and referenced from netcdf_utils.py. May want in the future to leverage a specific library for this. We could also extend this to be THE units contract for GeoClaw.

  1. Extension of topo_type = 4. This required 2 changes:

    a. Extended data.py to understand that topo_type=4 may have additional descriptors after the usual line of topo type and path. These lines carry information from NetCDFInterrogator including variable and coordinate names, coordinate conventions (e.g. order, wrapping), fill value, and cropping. This allows Fortran to simply read these values in and assume they are cogent. This also allows topography files to remain as is without pre-processing.

    b. Modification of topo_module.f90 had two sets of changes:

    Bug fixes in existing NetCDF implementation:

    • Dimension and variable inquiries were being mixed. Usually this did not break things, but depended on ordering of the variables and dimensions, not ideal.
    • Previous implementation had incomplete coordinate handling and did not account for N-to-S latitude ordering, non-standard dimension ordering, or [0, 360] longitude. These often produced silently wrong results for files (other than it may crash) that did not match the expected layout.

    New capabilities: Primarily the Fortran now reads the descriptor lines for topo_type=4 that come from DescriptorWrite / data.py.

    • Apply correct index arithmetic for lon convention and lat order without copying data
    • Compute start/count arguments for nf90_get_var from crop bounds, enabling domain subsetting without loading the full file
    • Abort cleanly on fill values within the crop region for topography
    • Emit a meaningful error message for any nf90_* call that fails, suitable for SLURM batch job logs
      Fortran reader still assumes the unit contract from units.py and no unit conversion is applied in Fortran.
  2. Modified utils.py so that get_netcdf_names is consistent with new NetCDFInterrogator. This is still a parallel implementation and will be deprecated (see known issues).

Tests

Addded tests:

pytest tests/netcdf/                          # new unit suite, all passing
pytest tests/test_storm.py                    # extended, all passing
pytest examples/tsunami/bowl-slosh-netcdf/    # extended, all passing
pytest examples/tsunami/chile2010/            # new, all passing
pytest examples/storm-surge/isaac/            # extended, all passing, 1 marked as skip

All these are marked as netcdf tests and are run with the existing NetCDF
based tests. Open question whether a couple of the regression tests take too
long. WRF variant is skipped pending MetPreprocessor implementation.

  1. Full pytest suite for netcdf_utils.py:

    • conftest.py -- in-memory NetCDF fixtures covering all coordinate variants (lon convention, lat order, dim order), fill value variants (_FillValue only, missing_value only, both present/agreeing, both present/conflicting, neither), unit variants for met forcing, and bad/edge cases
    • test_base_interrogator.py -- coordinate detection, fill value resolution, crop validation, laziness assertion
    • test_topo_interrogator.py -- fill-in-crop detection, unit paths, descriptor output correctness across coordinate variants
    • test_met_interrogator.py -- multi-variable sync, unit conversion, time decoding, ensemble dimension handling, descriptor output
    • test_descriptor_writer.py -- round-trip correctness for both topo and met descriptor formats
    • test_cf_normalizer.py -- attribute repair, idempotency, unknown variable passthrough

    All fixtures build NetCDF files in memory via netCDF4 -- no binary files are checked in.

  2. Modified: test_storm.py

    Extended to cover ERA5-style and NWS13-style NetCDF storm inputs:

    • test_netcdf_var_mapping_era5 -- verifies dimension/variable discovery for ERA5 layout (valid_time dim, u10/v10/msl variable names, lon [0,360])
    • test_netcdf_var_mapping_nws13 -- same for NWS13 variable schema
    • test_data_storm_roundtrip -- extended to parametrize over netcdf_era5 and netcdf_nws13 variants
    • test_netcdf_wrf_stub -- skipped with explanation; documents that WRF support requires MetPreprocessor for string-time axis and curvilinear grid handling, deferred to a future PR
  3. Modified: test_bowl_slosh_netcdf.py

    Extended _make_bowl_netcdf_topography to support coordinate variants (dim_order, cf_compliant, cropped). Added test_bowl_slosh_netcdf_variants parametrized over all non-standard variants, comparing against the existing reference gauge data. Results must match to 1e-4 relative/absolute since the bathymetry is identical.

  4. New test in examples/tsunami/chile2010/

    Added test_chile2010.py with a full regression test parametrized over topo input variants:

    • standard - existing ASCII topo, baseline behavior
    • netcdf_topotools - writes topo via topotools.Topography.write() to NetCDF, then verifies CF compliance via CFNormalizer; exercises the topotools write path that was previously untested
    • lat_flip - manually constructed NetCDF with N-to-S latitude
    • lon_360 - manually constructed NetCDF with [0,360] longitude

    All variants compare against committed reference gauge data.

Numerical behavior

Topography changes: the bug fixes in topo_module.f90 for coordinate handling mean that NetCDF topo files with N-to-S latitude or [0,360] longitude will now produce correct results where they previously produced silently wrong bathymetry. Files with the previously assumed layout (S-to-N, [-180,180]) are unaffected.

All other simulation parameters and numerical methods are unchanged.

Known issues and deferred work

  • util.get_netcdf_names and the discovery logic in NetCDFInterrogator are parallel implementations. Consolidation is deferred to the surge module refactor to avoid scope creep here.
  • Storm.write does not yet call DescriptorWriter directly; the .storm descriptor for NetCDF met forcing is currently written by the test setup code. Full integration is part of the surge refactor.
  • WRF support requires preprocessing for string-time axis (Times character array) and curvilinear grid (XLAT/XLONG 2D coordinate variables). Stub test documents the gap.
  • Precipitation and friction fields are reserved in GEOCLAW_NETCDF_UNITS but not yet implemented.
  • A CFNormalizer-based tool for making arbitrary NetCDF files GeoClaw-ready is available in netcdf_utils.py but not yet exposed as a command-line utility.
  • Adopt a standard unit package in Python.

mandli added 7 commits April 9, 2026 20:40
Includes better support for discovery of variables, coordinate mapping,
cropping of data files, and more robust reading overall.
Adds 3 tests to bowl-slosh-netcdf to test dimension ordering, CF compliance,
and cropping.  Lat-long coordinate utilities will be added to Chile 2010.
Also includes another bug fix for mixing up dimension and variable meta-data
inquiries in topo_module.f90.
Includes a basic test and more NetCDF tests for writing NetCDF files
from topotools, flipping latitude writing, and wrapping longitude.
Adds additional tests to the isaac test for the new NetCDF capabilities.
@mandli mandli marked this pull request as draft April 14, 2026 20:19
This allows to use dask if it is available, but then uses the generic
NetCDF backend if it is not and does not do lazy loading.  This is
probably not generally an issue for any use cases I have seen.
@mandli
Copy link
Copy Markdown
Member Author

mandli commented Apr 15, 2026

@kbarnhart This may be useful for some of the work you are doing to read in data for precipitation into GeoClaw. While not currently listed, precipitation is I think the most likely next variable field that I would add, may be a good test.

CI testing revealed that of course dtopo was not being generated for this
test. Written to allow future testing of dtopo variants as well.
@rjleveque
Copy link
Copy Markdown
Member

@mandli Thanks for working on this! I'm tied up this week but will try to go through it soon.

@mandli
Copy link
Copy Markdown
Member Author

mandli commented Apr 16, 2026

@mandli Thanks for working on this! I'm tied up this week but will try to go through it soon.

I have a bunch more to explain in this PR on why, what, and how it was implemented coming so no rush. I also am trying to get a docs PR to accompany this.

UPDATE - Added what changed and is now addressed. Still working on the doc PR and will link it here when issued.

mandli added 3 commits April 17, 2026 19:28
This allows for more ways to specify variables and remove the duplicative
aspects of the met forcing that was present.  Much more robust time
handling as well without assuming epochs.
@mandli mandli marked this pull request as ready for review May 4, 2026 16:21
mandli added 6 commits May 7, 2026 10:39
Replaced with simply checking for minimal dims and vars instead
Added a dict of common bathy variable names to fallback on if not
provided, but still allows for override.
This allows wrapping topo files in longitudinal
direction
Needed to remove the isaac.storm file being
written directly during the run of setrun.py and
instead only do so when called from `__main__`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants