Refactor and extend NetCDF input support for topography and storm surge (met) forcing#701
Open
mandli wants to merge 18 commits into
Open
Refactor and extend NetCDF input support for topography and storm surge (met) forcing#701mandli wants to merge 18 commits into
mandli wants to merge 18 commits into
Conversation
Includes better support for discovery of variables, coordinate mapping, cropping of data files, and more robust reading overall.
Adds 3 tests to bowl-slosh-netcdf to test dimension ordering, CF compliance, and cropping. Lat-long coordinate utilities will be added to Chile 2010. Also includes another bug fix for mixing up dimension and variable meta-data inquiries in topo_module.f90.
Includes a basic test and more NetCDF tests for writing NetCDF files from topotools, flipping latitude writing, and wrapping longitude.
Adds additional tests to the isaac test for the new NetCDF capabilities.
This allows to use dask if it is available, but then uses the generic NetCDF backend if it is not and does not do lazy loading. This is probably not generally an issue for any use cases I have seen.
Member
Author
|
@kbarnhart This may be useful for some of the work you are doing to read in data for precipitation into GeoClaw. While not currently listed, precipitation is I think the most likely next variable field that I would add, may be a good test. |
CI testing revealed that of course dtopo was not being generated for this test. Written to allow future testing of dtopo variants as well.
Member
|
@mandli Thanks for working on this! I'm tied up this week but will try to go through it soon. |
Member
Author
I have a bunch more to explain in this PR on why, what, and how it was implemented coming so no rush. I also am trying to get a docs PR to accompany this. UPDATE - Added what changed and is now addressed. Still working on the doc PR and will link it here when issued. |
This allows for more ways to specify variables and remove the duplicative aspects of the met forcing that was present. Much more robust time handling as well without assuming epochs.
Replaced with simply checking for minimal dims and vars instead
Added a dict of common bathy variable names to fallback on if not provided, but still allows for override.
This allows wrapping topo files in longitudinal direction
Needed to remove the isaac.storm file being written directly during the run of setrun.py and instead only do so when called from `__main__`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR replaces the partial, ad-hoc NetCDF support that existed for
topo_type=4with a complete, robust NetCDF input pipeline covering both topography and gridded met forcing for storm surge including at least partially what is described in #699. The new infrastructure handles CF convention discovery, coordinate normalization, unit enforcement, and descriptor-based dispatch in a way that is consistent across both input types and extensible to future fields (precipitation, friction). New documentation can be found at clawpack/doc#240.New Python Module
netcdf_utils.pyAdded a new python module
src/python/geoclaw/netcdf_utils.pythat provides:NetCDFInterrogator-- base class for CF-aware file inspection: coordinate discovery, lon convention detection ([0,360] vs [-180,180]), lat order detection (N-to-S vs S-to-N), dimension order discovery, fill value resolution (_FillValuevsmissing_valuewith correct CF precedence), and crop bound validation. Operates lazily via xarray -- no data arrays are loaded during interrogation.TopoInterrogator(NetCDFInterrogator)-- adds fill value detection within the crop region (hard error -- silent NaN in bathymetry is a correctness hazard), unit verification and conversion to the GeoClaw contract (meters, positive up).MetInterrogator(NetCDFInterrogator)-- adds multi-variable synchronization checks (wind_u, wind_v, pressure on the same grid and time axis), unit validation and conversion (pressure to Pa, wind to m/s), CF time decoding to seconds from a user-defined offset, and ensemble dimension detection.DescriptorWriter-- serializes interrogator output to the descriptor format consumed by Fortran: inline key=value lines intopo.datafor topography, and the body of*.stormfiles for met forcing.CFNormalizer-- standalone utility for repairing CF metadata (standard_name, axis, units, _FillValue) on existing NetCDF files without resampling or reprojecting. Idempotent.Modifications
units.pythat has aGEOCLAW_NETCDF_UNITSdictionary defining the units that Fortran expects.Python enforces this contract during interrogation. Fortran assumes it without checking. The contract is documented centrally here and referenced from
netcdf_utils.py. May want in the future to leverage a specific library for this. We could also extend this to be THE units contract for GeoClaw.Extension of
topo_type = 4. This required 2 changes:a. Extended
data.pyto understand thattopo_type=4may have additional descriptors after the usual line of topo type and path. These lines carry information fromNetCDFInterrogatorincluding variable and coordinate names, coordinate conventions (e.g. order, wrapping), fill value, and cropping. This allows Fortran to simply read these values in and assume they are cogent. This also allows topography files to remain as is without pre-processing.b. Modification of
topo_module.f90had two sets of changes:Bug fixes in existing NetCDF implementation:
New capabilities: Primarily the Fortran now reads the descriptor lines for
topo_type=4that come fromDescriptorWrite/data.py.start/countarguments fornf90_get_varfrom crop bounds, enabling domain subsetting without loading the full filenf90_*call that fails, suitable for SLURM batch job logsFortran reader still assumes the unit contract from
units.pyand no unit conversion is applied in Fortran.Modified
utils.pyso thatget_netcdf_namesis consistent with newNetCDFInterrogator. This is still a parallel implementation and will be deprecated (see known issues).Tests
Addded tests:
All these are marked as
netcdftests and are run with the existing NetCDFbased tests. Open question whether a couple of the regression tests take too
long. WRF variant is skipped pending
MetPreprocessorimplementation.Full pytest suite for
netcdf_utils.py:conftest.py-- in-memory NetCDF fixtures covering all coordinate variants (lon convention, lat order, dim order), fill value variants (_FillValue only, missing_value only, both present/agreeing, both present/conflicting, neither), unit variants for met forcing, and bad/edge casestest_base_interrogator.py-- coordinate detection, fill value resolution, crop validation, laziness assertiontest_topo_interrogator.py-- fill-in-crop detection, unit paths, descriptor output correctness across coordinate variantstest_met_interrogator.py-- multi-variable sync, unit conversion, time decoding, ensemble dimension handling, descriptor outputtest_descriptor_writer.py-- round-trip correctness for both topo and met descriptor formatstest_cf_normalizer.py-- attribute repair, idempotency, unknown variable passthroughAll fixtures build NetCDF files in memory via
netCDF4-- no binary files are checked in.Modified:
test_storm.pyExtended to cover ERA5-style and NWS13-style NetCDF storm inputs:
test_netcdf_var_mapping_era5-- verifies dimension/variable discovery for ERA5 layout (valid_timedim,u10/v10/mslvariable names, lon [0,360])test_netcdf_var_mapping_nws13-- same for NWS13 variable schematest_data_storm_roundtrip-- extended to parametrize overnetcdf_era5andnetcdf_nws13variantstest_netcdf_wrf_stub-- skipped with explanation; documents that WRF support requiresMetPreprocessorfor string-time axis and curvilinear grid handling, deferred to a future PRModified:
test_bowl_slosh_netcdf.pyExtended
_make_bowl_netcdf_topographyto support coordinate variants (dim_order,cf_compliant,cropped). Addedtest_bowl_slosh_netcdf_variantsparametrized over all non-standard variants, comparing against the existing reference gauge data. Results must match to 1e-4 relative/absolute since the bathymetry is identical.New test in
examples/tsunami/chile2010/Added
test_chile2010.pywith a full regression test parametrized over topo input variants:standard- existing ASCII topo, baseline behaviornetcdf_topotools- writes topo viatopotools.Topography.write()to NetCDF, then verifies CF compliance viaCFNormalizer; exercises thetopotoolswrite path that was previously untestedlat_flip- manually constructed NetCDF with N-to-S latitudelon_360- manually constructed NetCDF with [0,360] longitudeAll variants compare against committed reference gauge data.
Numerical behavior
Topography changes: the bug fixes in
topo_module.f90for coordinate handling mean that NetCDF topo files with N-to-S latitude or [0,360] longitude will now produce correct results where they previously produced silently wrong bathymetry. Files with the previously assumed layout (S-to-N, [-180,180]) are unaffected.All other simulation parameters and numerical methods are unchanged.
Known issues and deferred work
util.get_netcdf_namesand the discovery logic inNetCDFInterrogatorare parallel implementations. Consolidation is deferred to the surge module refactor to avoid scope creep here.Storm.writedoes not yet callDescriptorWriterdirectly; the.stormdescriptor for NetCDF met forcing is currently written by the test setup code. Full integration is part of the surge refactor.Timescharacter array) and curvilinear grid (XLAT/XLONG2D coordinate variables). Stub test documents the gap.GEOCLAW_NETCDF_UNITSbut not yet implemented.CFNormalizer-based tool for making arbitrary NetCDF files GeoClaw-ready is available innetcdf_utils.pybut not yet exposed as a command-line utility.