Refactor and extend NetCDF input support for topography and storm surge (met) forcing by mandli · Pull Request #701 · clawpack/geoclaw

mandli · 2026-04-14T20:19:10Z

This PR replaces the partial, ad-hoc NetCDF support that existed for topo_type=4 with a complete, robust NetCDF input pipeline covering both topography and gridded met forcing for storm surge including at least partially what is described in #699. The new infrastructure handles CF convention discovery, coordinate normalization, unit enforcement, and descriptor-based dispatch in a way that is consistent across both input types and extensible to future fields (precipitation, friction). New documentation can be found at clawpack/doc#240.

New Python Module `netcdf_utils.py`

Added a new python module src/python/geoclaw/netcdf_utils.py that provides:

NetCDFInterrogator -- base class for CF-aware file inspection: coordinate discovery, lon convention detection ([0,360] vs [-180,180]), lat order detection (N-to-S vs S-to-N), dimension order discovery, fill value resolution (_FillValue vs missing_value with correct CF precedence), and crop bound validation. Operates lazily via xarray -- no data arrays are loaded during interrogation.
TopoInterrogator(NetCDFInterrogator) -- adds fill value detection within the crop region (hard error -- silent NaN in bathymetry is a correctness hazard), unit verification and conversion to the GeoClaw contract (meters, positive up).
MetInterrogator(NetCDFInterrogator) -- adds multi-variable synchronization checks (wind_u, wind_v, pressure on the same grid and time axis), unit validation and conversion (pressure to Pa, wind to m/s), CF time decoding to seconds from a user-defined offset, and ensemble dimension detection.
DescriptorWriter -- serializes interrogator output to the descriptor format consumed by Fortran: inline key=value lines in topo.data for topography, and the body of *.storm files for met forcing.
CFNormalizer -- standalone utility for repairing CF metadata (standard_name, axis, units, _FillValue) on existing NetCDF files without resampling or reprojecting. Idempotent.

Modifications

Added a units contract to units.py that has a GEOCLAW_NETCDF_UNITS dictionary defining the units that Fortran expects.

GEOCLAW_NETCDF_UNITS = {
    "topo":     "m",
    "wind_u":   "m/s",
    "wind_v":   "m/s",
    "pressure": "Pa",
    "time":     "s",
    # precipitation, friction: reserved for future implementation
}

Python enforces this contract during interrogation. Fortran assumes it without checking. The contract is documented centrally here and referenced from netcdf_utils.py. May want in the future to leverage a specific library for this. We could also extend this to be THE units contract for GeoClaw.

Extension of topo_type = 4. This required 2 changes:

a. Extended data.py to understand that topo_type=4 may have additional descriptors after the usual line of topo type and path. These lines carry information from NetCDFInterrogator including variable and coordinate names, coordinate conventions (e.g. order, wrapping), fill value, and cropping. This allows Fortran to simply read these values in and assume they are cogent. This also allows topography files to remain as is without pre-processing.

b. Modification of topo_module.f90 had two sets of changes:

Bug fixes in existing NetCDF implementation:
- Dimension and variable inquiries were being mixed. Usually this did not break things, but depended on ordering of the variables and dimensions, not ideal.
- Previous implementation had incomplete coordinate handling and did not account for N-to-S latitude ordering, non-standard dimension ordering, or [0, 360] longitude. These often produced silently wrong results for files (other than it may crash) that did not match the expected layout.
New capabilities: Primarily the Fortran now reads the descriptor lines for topo_type=4 that come from DescriptorWrite / data.py.
- Apply correct index arithmetic for lon convention and lat order without copying data
- Compute start/count arguments for nf90_get_var from crop bounds, enabling domain subsetting without loading the full file
- Abort cleanly on fill values within the crop region for topography
- Emit a meaningful error message for any nf90_* call that fails, suitable for SLURM batch job logs
  Fortran reader still assumes the unit contract from units.py and no unit conversion is applied in Fortran.
Modified utils.py so that get_netcdf_names is consistent with new NetCDFInterrogator. This is still a parallel implementation and will be deprecated (see known issues).

Tests

Addded tests:

pytest tests/netcdf/                          # new unit suite, all passing
pytest tests/test_storm.py                    # extended, all passing
pytest examples/tsunami/bowl-slosh-netcdf/    # extended, all passing
pytest examples/tsunami/chile2010/            # new, all passing
pytest examples/storm-surge/isaac/            # extended, all passing, 1 marked as skip

All these are marked as netcdf tests and are run with the existing NetCDF
based tests. Open question whether a couple of the regression tests take too
long. WRF variant is skipped pending MetPreprocessor implementation.

Full pytest suite for netcdf_utils.py:
- conftest.py -- in-memory NetCDF fixtures covering all coordinate variants (lon convention, lat order, dim order), fill value variants (_FillValue only, missing_value only, both present/agreeing, both present/conflicting, neither), unit variants for met forcing, and bad/edge cases
- test_base_interrogator.py -- coordinate detection, fill value resolution, crop validation, laziness assertion
- test_topo_interrogator.py -- fill-in-crop detection, unit paths, descriptor output correctness across coordinate variants
- test_met_interrogator.py -- multi-variable sync, unit conversion, time decoding, ensemble dimension handling, descriptor output
- test_descriptor_writer.py -- round-trip correctness for both topo and met descriptor formats
- test_cf_normalizer.py -- attribute repair, idempotency, unknown variable passthrough
All fixtures build NetCDF files in memory via netCDF4 -- no binary files are checked in.
Modified: test_storm.py

Extended to cover ERA5-style and NWS13-style NetCDF storm inputs:
- test_netcdf_var_mapping_era5 -- verifies dimension/variable discovery for ERA5 layout (valid_time dim, u10/v10/msl variable names, lon [0,360])
- test_netcdf_var_mapping_nws13 -- same for NWS13 variable schema
- test_data_storm_roundtrip -- extended to parametrize over netcdf_era5 and netcdf_nws13 variants
- test_netcdf_wrf_stub -- skipped with explanation; documents that WRF support requires MetPreprocessor for string-time axis and curvilinear grid handling, deferred to a future PR
Modified: test_bowl_slosh_netcdf.py

Extended _make_bowl_netcdf_topography to support coordinate variants (dim_order, cf_compliant, cropped). Added test_bowl_slosh_netcdf_variants parametrized over all non-standard variants, comparing against the existing reference gauge data. Results must match to 1e-4 relative/absolute since the bathymetry is identical.
New test in examples/tsunami/chile2010/

Added test_chile2010.py with a full regression test parametrized over topo input variants:
- standard - existing ASCII topo, baseline behavior
- netcdf_topotools - writes topo via topotools.Topography.write() to NetCDF, then verifies CF compliance via CFNormalizer; exercises the topotools write path that was previously untested
- lat_flip - manually constructed NetCDF with N-to-S latitude
- lon_360 - manually constructed NetCDF with [0,360] longitude
All variants compare against committed reference gauge data.

Numerical behavior

Topography changes: the bug fixes in topo_module.f90 for coordinate handling mean that NetCDF topo files with N-to-S latitude or [0,360] longitude will now produce correct results where they previously produced silently wrong bathymetry. Files with the previously assumed layout (S-to-N, [-180,180]) are unaffected.

All other simulation parameters and numerical methods are unchanged.

Known issues and deferred work

util.get_netcdf_names and the discovery logic in NetCDFInterrogator are parallel implementations. Consolidation is deferred to the surge module refactor to avoid scope creep here.
Storm.write does not yet call DescriptorWriter directly; the .storm descriptor for NetCDF met forcing is currently written by the test setup code. Full integration is part of the surge refactor.
WRF support requires preprocessing for string-time axis (Times character array) and curvilinear grid (XLAT/XLONG 2D coordinate variables). Stub test documents the gap.
Precipitation and friction fields are reserved in GEOCLAW_NETCDF_UNITS but not yet implemented.
A CFNormalizer-based tool for making arbitrary NetCDF files GeoClaw-ready is available in netcdf_utils.py but not yet exposed as a command-line utility.
Adopt a standard unit package in Python.

Includes better support for discovery of variables, coordinate mapping, cropping of data files, and more robust reading overall.

Adds 3 tests to bowl-slosh-netcdf to test dimension ordering, CF compliance, and cropping. Lat-long coordinate utilities will be added to Chile 2010. Also includes another bug fix for mixing up dimension and variable meta-data inquiries in topo_module.f90.

Includes a basic test and more NetCDF tests for writing NetCDF files from topotools, flipping latitude writing, and wrapping longitude.

Adds additional tests to the isaac test for the new NetCDF capabilities.

This allows to use dask if it is available, but then uses the generic NetCDF backend if it is not and does not do lazy loading. This is probably not generally an issue for any use cases I have seen.

mandli · 2026-04-15T19:19:47Z

@kbarnhart This may be useful for some of the work you are doing to read in data for precipitation into GeoClaw. While not currently listed, precipitation is I think the most likely next variable field that I would add, may be a good test.

CI testing revealed that of course dtopo was not being generated for this test. Written to allow future testing of dtopo variants as well.

rjleveque · 2026-04-16T03:42:10Z

@mandli Thanks for working on this! I'm tied up this week but will try to go through it soon.

mandli · 2026-04-16T12:10:45Z

@mandli Thanks for working on this! I'm tied up this week but will try to go through it soon.

I have a bunch more to explain in this PR on why, what, and how it was implemented coming so no rush. I also am trying to get a docs PR to accompany this.

UPDATE - Added what changed and is now addressed. Still working on the doc PR and will link it here when issued.

This allows for more ways to specify variables and remove the duplicative aspects of the met forcing that was present. Much more robust time handling as well without assuming epochs.

Replaced with simply checking for minimal dims and vars instead

Added a dict of common bathy variable names to fallback on if not provided, but still allows for override.

This allows wrapping topo files in longitudinal direction

Needed to remove the isaac.storm file being written directly during the run of setrun.py and instead only do so when called from `__main__`.

mandli added 7 commits April 9, 2026 20:40

Initial support for new NetCDF capabilities

6eb0370

Includes better support for discovery of variables, coordinate mapping, cropping of data files, and more robust reading overall.

Add python tests for neew NetCDF capabilities

fcbaae9

Add Chile 2010 test

39eb56a

Includes a basic test and more NetCDF tests for writing NetCDF files from topotools, flipping latitude writing, and wrapping longitude.

Add NetCDF Fortran storm tests

19beda6

Adds additional tests to the isaac test for the new NetCDF capabilities.

Properly mark netcdf Chile 2010 tests

4bd2ebe

Fix Makefile LFLAGS in bowl-slosh-netcdf example

58d02e0

mandli marked this pull request as draft April 14, 2026 20:19

Put fencing up around xarray possible use of dask

810eb1f

This allows to use dask if it is available, but then uses the generic NetCDF backend if it is not and does not do lazy loading. This is probably not generally an issue for any use cases I have seen.

Add generation of dtopo to chile2010 test

cbd295f

CI testing revealed that of course dtopo was not being generated for this test. Written to allow future testing of dtopo variants as well.

mandli mentioned this pull request Apr 16, 2026

Add Documentation on New NetCDF Input in GeoClaw clawpack/doc#240

Open

mandli added 3 commits April 17, 2026 19:28

Implement better support for netCDF met forcing

2f56a1e

This allows for more ways to specify variables and remove the duplicative aspects of the met forcing that was present. Much more robust time handling as well without assuming epochs.

Modify and add tests for new netCDF met forcing

e36657f

Fix isaac tests for new netCDF met forcing input

cc82fcf

mandli marked this pull request as ready for review May 4, 2026 16:21

mandli added 6 commits May 7, 2026 10:39

Remove hard dim and var requirements for data storms

c04848d

Replaced with simply checking for minimal dims and vars instead

Remove the required bathy var name from topo interrogator

7785bfc

Added a dict of common bathy variable names to fallback on if not provided, but still allows for override.

Add MetInterrogator to data storms as an option

7d7008e

Trim paths for output

a95487d

Add topo_entries to netcdf interrogator

61b32fd

This allows wrapping topo files in longitudinal direction

Fix isaac.storm written during test

fbfaa89

Needed to remove the isaac.storm file being written directly during the run of setrun.py and instead only do so when called from `__main__`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor and extend NetCDF input support for topography and storm surge (met) forcing#701

Refactor and extend NetCDF input support for topography and storm surge (met) forcing#701
mandli wants to merge 18 commits into
clawpack:masterfrom
mandli:refactor-netcdf-support

mandli commented Apr 14, 2026 •

edited

Loading

Uh oh!

mandli commented Apr 15, 2026

Uh oh!

rjleveque commented Apr 16, 2026

Uh oh!

mandli commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mandli commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Python Module netcdf_utils.py

Modifications

Tests

Numerical behavior

Known issues and deferred work

Uh oh!

mandli commented Apr 15, 2026

Uh oh!

rjleveque commented Apr 16, 2026

Uh oh!

mandli commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mandli commented Apr 14, 2026 •

edited

Loading

New Python Module `netcdf_utils.py`

mandli commented Apr 16, 2026 •

edited

Loading