diff --git a/docs/tools/cmor.rst b/docs/tools/cmor.rst index caf33eb65..86a52718f 100644 --- a/docs/tools/cmor.rst +++ b/docs/tools/cmor.rst @@ -6,12 +6,40 @@ CMOR Subcommands Overview ``fre cmor`` rewrites climate model output files with CMIP-compliant metadata. Both CMIP6 and CMIP7 workflows are supported. Available subcommands: +* ``fre cmor init`` - Initialise CMOR resources: generate experiment-config templates and/or fetch MIP tables * ``fre cmor run`` - Rewrite individual directories of netCDF files * ``fre cmor yaml`` - Process multiple directories/tables using YAML configuration * ``fre cmor find`` - Search MIP tables for variable definitions * ``fre cmor varlist`` - Generate variable lists from netCDF files * ``fre cmor config`` - Generate a CMOR YAML configuration from a post-processing directory tree +``init`` +-------- + +* Initialise CMOR resources for a new experiment: generate an empty experiment-config JSON template and/or fetch official MIP tables +* Tables are fetched via ``git clone --depth 1`` by default; pass ``--fast`` to download a tarball via ``curl`` instead +* Trusted sources: `pcmdi/cmip6-cmor-tables `_, `WCRP-CMIP/cmip7-cmor-tables `_ +* Minimal Syntax: ``fre cmor init -m [mip_era] [options]`` +* Required Options: + - ``-m, --mip_era [cmip6|cmip7]`` - MIP era for the template +* Optional: + - ``-e, --exp_config TEXT`` - Output path for the template experiment-config JSON file (default name used when omitted) + - ``-t, --tables_dir TEXT`` - Directory into which MIP tables will be fetched + - ``--tag TEXT`` - Specific git tag or release for the MIP tables repository + - ``--fast`` - Use ``curl`` to download a tarball instead of ``git clone`` +* Examples: + + .. code-block:: bash + + # Generate an empty CMIP6 experiment config template + fre cmor init -m cmip6 -e my_experiment.json + + # Generate a CMIP7 template and fetch the latest MIP tables via git + fre cmor init -m cmip7 -e my_cmip7_exp.json -t ./cmip7-tables + + # Fetch CMIP6 tables at a specific tag using curl (fast mode) + fre cmor init -m cmip6 -t ./cmip6-tables --tag v6.2.7.18 --fast + ``run`` ------- diff --git a/docs/usage/cmor.rst b/docs/usage/cmor.rst index c2164550e..1d58c6e99 100644 --- a/docs/usage/cmor.rst +++ b/docs/usage/cmor.rst @@ -16,6 +16,7 @@ Getting Started ``fre cmor`` provides several subcommands: +* ``fre cmor init`` - Initialise CMOR resources: generate experiment-config templates and/or fetch MIP tables from trusted sources * ``fre cmor run`` - Core engine for rewriting individual directories of netCDF files according to a MIP table * ``fre cmor yaml`` - Higher-level tool for processing multiple directories / MIP tables using YAML configuration * ``fre cmor find`` - Helper for exploring MIP table configurations for information on a specific variable @@ -34,6 +35,7 @@ Additional Resources -------------------- * `CMIP6 Tables `_ +* `CMIP7 Tables `_ * `CMIP6 Controlled Vocabulary `_ * `PCMDI CMOR User Guide `_ * `fre cmor README `_ diff --git a/docs/usage/cmor_cookbook.rst b/docs/usage/cmor_cookbook.rst index d363d58dd..1bec60a8b 100644 --- a/docs/usage/cmor_cookbook.rst +++ b/docs/usage/cmor_cookbook.rst @@ -10,9 +10,54 @@ Overview The ``fre cmor`` process typically follows this pattern: -1. Setup and Configuration - Identify your experiment parameters, create variable lists, and prepare experiment configuration -2. CMORization - Use ``fre cmor run`` to process individual directories or ``fre cmor yaml`` for bulk processing -3. Troubleshooting - Diagnose issues as needed (note: ``fre yamltools combine-yamls --use cmor`` can help debug YAML configurations) +1. Initialisation - Use ``fre cmor init`` to generate an experiment-config template and fetch MIP tables +2. Setup and Configuration - Fill in experiment parameters, create variable lists, and prepare experiment configuration +3. Auto-generate YAML (optional) - Use ``fre cmor config`` to scan a post-processing directory tree and generate the YAML that ``fre cmor yaml`` expects +4. CMORization - Use ``fre cmor run`` to process individual directories or ``fre cmor yaml`` for bulk processing +5. Troubleshooting - Diagnose issues as needed (note: ``fre yamltools combine-yamls --use cmor`` can help debug YAML configurations) + +Initialisation +-------------- + +Use ``fre cmor init`` to bootstrap your CMORization setup. This command generates an empty experiment-config +JSON template for the target MIP era and can optionally fetch the official MIP tables from their trusted +GitHub repositories. + +Generating an Experiment Config Template +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: bash + + # CMIP6 template + fre cmor init --mip_era cmip6 --exp_config my_cmip6_experiment.json + + # CMIP7 template + fre cmor init --mip_era cmip7 --exp_config my_cmip7_experiment.json + +The generated JSON file contains all fields expected by ``fre cmor run`` and the underlying CMOR library, +with empty placeholder values that you fill in for your specific experiment (e.g. ``experiment_id``, +``source_id``, ``calendar``, ``grid_label``, output path templates, etc.). + +Fetching MIP Tables +~~~~~~~~~~~~~~~~~~~ + +``fre cmor init`` can also fetch the official MIP tables for you: + +.. code-block:: bash + + # Fetch CMIP6 tables via git (default, shallow clone) + fre cmor init --mip_era cmip6 --tables_dir ./cmip6-cmor-tables + + # Fetch CMIP7 tables at a specific release tag using curl (fast mode) + fre cmor init --mip_era cmip7 --tables_dir ./cmip7-cmor-tables --tag v1.0.0 --fast + + # Generate a template AND fetch tables in one call + fre cmor init --mip_era cmip7 --exp_config my_exp.json --tables_dir ./cmip7-tables + +Trusted sources: + +* CMIP6: `pcmdi/cmip6-cmor-tables `_ +* CMIP7: `WCRP-CMIP/cmip7-cmor-tables `_ Setup and Configuration ----------------------- @@ -86,6 +131,39 @@ This file should include: * Calendar type Refer to CMIP6 controlled vocabularies and your project's requirements when filling in these fields. +You can use ``fre cmor init`` to generate a template with all of these fields pre-populated with empty +placeholder values. + +Auto-Generating CMOR YAML Configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you have an existing post-processing directory tree from FRE workflows, ``fre cmor config`` can scan it +to auto-generate the CMOR YAML configuration file that ``fre cmor yaml`` expects. It cross-references found +variables against MIP tables, creates per-component variable list JSON files, and writes the structured YAML. + +.. code-block:: bash + + fre cmor config \ + --pp_dir /archive/user/experiment/pp \ + --mip_tables_dir /path/to/cmip7-cmor-tables/tables \ + --mip_era cmip7 \ + --exp_config /path/to/CMOR_input.json \ + --output_yaml cmor_config.yaml \ + --output_dir /path/to/cmor_output \ + --varlist_dir /path/to/varlists \ + --freq monthly --chunk 5yr --grid gn \ + --calendar noleap + +This command: + +* Scans ``--pp_dir`` for component directories (e.g. ``ocean_monthly_1x1deg``) +* Looks for time-series data under each component at ``ts///`` +* Matches variable names from netCDF files against all MIP tables in ``--mip_tables_dir`` +* Writes per-component variable list JSON files to ``--varlist_dir`` +* Produces a YAML file at ``--output_yaml`` that can be fed directly to ``fre cmor yaml`` + +Pass ``--overwrite`` to regenerate existing variable list files. Adjust ``--freq``, ``--chunk``, and +``--grid`` to match your post-processing directory structure. Running Your CMORization ------------------------ @@ -342,12 +420,13 @@ Full processing: Tips ---- +* Use ``fre cmor init`` to bootstrap your setup — it generates an experiment-config template and can fetch MIP tables in one step +* Use ``fre cmor config`` to auto-generate a CMOR YAML configuration from a post-processing directory tree — it scans components, cross-references against MIP tables, and writes both variable lists and the YAML that ``fre cmor yaml`` expects * Use ``fre yamltools combine-yamls`` before attempting CMORization to help figure out YAML issues * Use ``--dry_run`` with ``fre cmor yaml`` to preview the equivalent ``fre cmor run`` calls before execution * Use ``--no-print_cli_call`` with ``--dry_run`` to see the Python ``cmor_run_subtool(...)`` call instead of the CLI invocation — useful for debugging * Use ``--run_one`` with ``fre cmor run`` for testing to only process a single file and catch issues early * Use ``--run_one`` with ``fre cmor yaml`` to process a single file per ``fre cmor run`` call for quicker debugging -* Use ``fre cmor config`` to auto-generate a CMOR YAML configuration from a post-processing directory tree — it scans components, cross-references against MIP tables, and writes both variable lists and the YAML that ``fre cmor yaml`` expects * Increase verbosity when debugging - Use ``-v`` to see ``INFO`` logging, and ``-vv`` (or ``-v -v``) for ``DEBUG`` logging * Version control your YAML files - Track changes to your CMORization configuration and commit them to git! * Check controlled vocabulary - Verify grid labels and nominal resolutions are CV-compliant diff --git a/environment.yml b/environment.yml index 05801b268..5aad0a576 100644 --- a/environment.yml +++ b/environment.yml @@ -6,23 +6,18 @@ dependencies: - python>=3.11 - noaa-gfdl::analysis_scripts==0.0.1 - noaa-gfdl::catalogbuilder==2025.01.01 -# - noaa-gfdl::fre-nctools==2022.02.01 - - conda-forge::cdo>=2 - conda-forge::cftime - conda-forge::click>=8.2 - - conda-forge::cmor>=3.14 + - conda-forge::cmor>=3.14.1 - conda-forge::cylc-flow>=8.2 - conda-forge::cylc-rose - conda-forge::jinja2>=3 - conda-forge::jsonschema - conda-forge::metomi-rose - - conda-forge::nccmp -# - conda-forge::numpy==1.26.4 - conda-forge::numpy>=2 - conda-forge::pytest - conda-forge::pytest-cov - conda-forge::pylint - - conda-forge::python-cdo - conda-forge::pyyaml - conda-forge::xarray>=2024.* - conda-forge::netcdf4>=1.7.* diff --git a/fre/app/freapp.py b/fre/app/freapp.py index 4744fe5e8..ac794c424 100644 --- a/fre/app/freapp.py +++ b/fre/app/freapp.py @@ -138,8 +138,8 @@ def mask_atmos_plevel(infile, psfile, outfile, warn_no_ps): required = True, help = "Output file name") @click.option("-p", "--pkg", - type = click.Choice(["cdo","fre-nctools","fre-python-tools"]), - default = "cdo", + type = click.Choice(["cdo","fre-nctools","fre-python-tools","xarray"]), + default = "fre-python-tools", help = "Time average approach") @click.option("-v", "--var", type = str, @@ -192,8 +192,8 @@ def gen_time_averages(inf, outf, pkg, var, unwgt, avg_type): required = True, help = "Frequency of desired climatology: 'mon' or 'yr'") @click.option("-p", "--pkg", - type = click.Choice(["cdo","fre-nctools","fre-python-tools"]), - default = "cdo", + type = click.Choice(["cdo","fre-nctools","fre-python-tools","xarray"]), + default = "fre-python-tools", help = "Time average approach") def gen_time_averages_wrapper(cycle_point, dir_, sources, output_interval, input_interval, grid, frequency, pkg): """ diff --git a/fre/app/generate_time_averages/__init__.py b/fre/app/generate_time_averages/__init__.py index 699064f4b..1546030d8 100644 --- a/fre/app/generate_time_averages/__init__.py +++ b/fre/app/generate_time_averages/__init__.py @@ -1,3 +1,4 @@ '''required for generate_time_averages module import functionality''' __all__ = ['generate_time_averages', 'timeAverager', 'wrapper', 'combine', - 'frenctoolsTimeAverager', 'cdoTimeAverager', 'frepytoolsTimeAverager'] + 'frenctoolsTimeAverager', 'cdoTimeAverager', 'frepytoolsTimeAverager', + 'xarrayTimeAverager'] diff --git a/fre/app/generate_time_averages/cdoTimeAverager.py b/fre/app/generate_time_averages/cdoTimeAverager.py index c8f585887..ddc3f399a 100644 --- a/fre/app/generate_time_averages/cdoTimeAverager.py +++ b/fre/app/generate_time_averages/cdoTimeAverager.py @@ -1,89 +1,35 @@ -''' class using (mostly) cdo functions for time-averages ''' +''' stub that redirects pkg='cdo' requests to the xarray time averager ''' import logging +import warnings -from netCDF4 import Dataset -import numpy as np - -import cdo -from cdo import Cdo - -from .timeAverager import timeAverager +from .xarrayTimeAverager import xarrayTimeAverager fre_logger = logging.getLogger(__name__) -class cdoTimeAverager(timeAverager): + +class cdoTimeAverager(xarrayTimeAverager): # pylint: disable=invalid-name ''' - class inheriting from abstract base class timeAverager - generates time-averages using cdo (mostly, see weighted approach) + Legacy entry-point kept for backward compatibility. + CDO/python-cdo has been removed. All work is now done by xarrayTimeAverager. ''' - def generate_timavg(self, infile = None, outfile = None): + def generate_timavg(self, infile=None, outfile=None): """ - use cdo package routines via python bindings + Emit a loud warning then delegate to the xarray implementation. - :param self: This is an instance of the class cdoTimeAverager - :param infile: path to history file, or list of paths, default is None - :type infile: str, list - :param outfile: path to where output file should be stored, default is None + :param infile: path to input NetCDF file + :type infile: str + :param outfile: path to output file :type outfile: str - :return: 1 if the instance variable self.avg_typ is unsupported, 0 if function has a clean exit + :return: 0 on success :rtype: int """ - - if self.avg_type not in ['all', 'seas', 'month']: - fre_logger.error('requested unknown avg_type %s.', self.avg_type) - raise ValueError(f'requested unknown avg_type {self.avg_type}') - - if self.var is not None: - fre_logger.warning('WARNING: variable specification not twr supported for cdo time averaging. ignoring!') - - fre_logger.info('python-cdo version is %s', cdo.__version__) - - _cdo = Cdo() - - wgts_sum = 0 - if not self.unwgt: #weighted case, cdo ops alone don't support a weighted time-average. - - nc_fin = Dataset(infile, 'r') - - time_bnds = nc_fin['time_bnds'][:].copy() - # Ensure float64 precision for consistent results across numpy versions - # NumPy 2.0 changed type promotion rules (NEP 50), so explicit casting - # is needed to avoid precision differences - time_bnds = np.asarray(time_bnds, dtype=np.float64) - # Transpose once to avoid redundant operations - time_bnds_transposed = np.moveaxis(time_bnds, 0, -1) - wgts = time_bnds_transposed[1] - time_bnds_transposed[0] - # Use numpy.sum for consistent dtype handling across numpy versions - wgts_sum = np.sum(wgts, dtype=np.float64) - - fre_logger.debug('wgts_sum = %s', wgts_sum) - - if self.avg_type == 'all': - fre_logger.info('time average over all time requested.') - if self.unwgt: - _cdo.timmean(input = infile, output = outfile, returnCdf = True) - else: - _cdo.divc( str(wgts_sum), input = "-timsum -muldpm "+infile, output = outfile) - fre_logger.info('done averaging over all time.') - - elif self.avg_type == 'seas': - fre_logger.info('seasonal time-averages requested.') - _cdo.yseasmean(input = infile, output = outfile, returnCdf = True) - fre_logger.info('done averaging over seasons.') - - elif self.avg_type == 'month': - fre_logger.info('monthly time-averages requested.') - outfile_str = str(outfile) - _cdo.ymonmean(input = infile, output = outfile_str, returnCdf = True) - fre_logger.info('done averaging over months.') - - fre_logger.warning(" splitting by month") - outfile_root = outfile_str.removesuffix(".nc") + '.' - _cdo.splitmon(input = outfile_str, output = outfile_root) - fre_logger.debug('Done with splitting by month, outfile_root = %s', outfile_root) - - fre_logger.info('done averaging') - fre_logger.info('output file created: %s', outfile) - return 0 + msg = ( + "WARNING *** CDO/python-cdo has been REMOVED from fre-cli. " + "pkg='cdo' now uses the xarray time-averager under the hood. " + "Please switch to pkg='xarray' or pkg='fre-python-tools'. ***" + ) + warnings.warn(msg, FutureWarning, stacklevel=2) + fre_logger.warning(msg) + return super().generate_timavg(infile=infile, outfile=outfile) diff --git a/fre/app/generate_time_averages/frenctoolsTimeAverager.py b/fre/app/generate_time_averages/frenctoolsTimeAverager.py index 59ebf95ce..4049ad4d9 100644 --- a/fre/app/generate_time_averages/frenctoolsTimeAverager.py +++ b/fre/app/generate_time_averages/frenctoolsTimeAverager.py @@ -5,7 +5,7 @@ from subprocess import Popen, PIPE from pathlib import Path -from cdo import Cdo +import xarray as xr from .timeAverager import timeAverager fre_logger = logging.getLogger(__name__) @@ -80,38 +80,40 @@ def generate_timavg(self, infile=None, outfile=None): month_output_file_paths[month_index] = os.path.join( output_dir, f"{Path(outfile).stem}.{month_index:02d}.nc") - cdo = Cdo() - #Loop through each month and select the corresponding data - for month_index in month_indices: - - #month_name = month_names[month_index - 1] - nc_monthly_file = nc_month_file_paths[month_index] - - #Select data for the given month - cdo.select(f"month={month_index}", input=infile, output=nc_monthly_file) - - #Run timavg command for newly created file - month_output_file = month_output_file_paths[month_index] - #timavgcsh_command=['timavg.csh', '-mb','-o', month_output_file, nc_monthly_file] - timavgcsh_command=[shutil.which('timavg.csh'), '-dmb','-o', month_output_file, nc_monthly_file] - fre_logger.info( 'timavgcsh_command is %s', ' '.join(timavgcsh_command) ) - exitstatus=1 - with Popen(timavgcsh_command, - stdout=PIPE, stderr=PIPE, shell=False) as subp: - stdout, stderr = subp.communicate() - stdoutput=stdout.decode() - fre_logger.info('output= %s', stdoutput) - stderror=stderr.decode() - fre_logger.info('error = %s', stderror ) - - if subp.returncode != 0: - fre_logger.error('stderror = %s', stderror) - raise ValueError(f'error: timavg.csh had a problem, subp.returncode = {subp.returncode}') - - fre_logger.info('%s climatology successfully ran',nc_monthly_file) - exitstatus=0 - - #Delete files after being used to generate output files + with xr.open_dataset(infile) as ds_in: + #Loop through each month and select the corresponding data + for month_index in month_indices: + + #month_name = month_names[month_index - 1] + nc_monthly_file = nc_month_file_paths[month_index] + + #Select data for the given month + month_ds = ds_in.sel(time=ds_in['time'].dt.month == month_index) + month_ds.to_netcdf(nc_monthly_file) + month_ds.close() + + #Run timavg command for newly created file + month_output_file = month_output_file_paths[month_index] + #timavgcsh_command=['timavg.csh', '-mb','-o', month_output_file, nc_monthly_file] + timavgcsh_command=[shutil.which('timavg.csh'), '-dmb','-o', month_output_file, nc_monthly_file] + fre_logger.info( 'timavgcsh_command is %s', ' '.join(timavgcsh_command) ) + exitstatus=1 + with Popen(timavgcsh_command, + stdout=PIPE, stderr=PIPE, shell=False) as subp: + stdout, stderr = subp.communicate() + stdoutput=stdout.decode() + fre_logger.info('output= %s', stdoutput) + stderror=stderr.decode() + fre_logger.info('error = %s', stderror ) + + if subp.returncode != 0: + fre_logger.error('stderror = %s', stderror) + raise ValueError(f'error: timavg.csh had a problem, subp.returncode = {subp.returncode}') + + fre_logger.info('%s climatology successfully ran',nc_monthly_file) + exitstatus=0 + + #Delete files after being used to generate output files shutil.rmtree('monthly_nc_files') if self.avg_type == 'month': #End here if month variable used diff --git a/fre/app/generate_time_averages/frepytoolsTimeAverager.py b/fre/app/generate_time_averages/frepytoolsTimeAverager.py index c91220b3b..6459938af 100644 --- a/fre/app/generate_time_averages/frepytoolsTimeAverager.py +++ b/fre/app/generate_time_averages/frepytoolsTimeAverager.py @@ -9,7 +9,7 @@ fre_logger = logging.getLogger(__name__) -class frepytoolsTimeAverager(timeAverager): +class NumpyTimeAverager(timeAverager): # pylint: disable=invalid-name ''' class inheriting from abstract base class timeAverager generates time-averages using a python-native approach @@ -256,3 +256,6 @@ def generate_timavg(self, infile = None, outfile = None): fre_logger.debug('input file closed') return 0 + +# backward-compatible alias +frepytoolsTimeAverager = NumpyTimeAverager # pylint: disable=invalid-name diff --git a/fre/app/generate_time_averages/generate_time_averages.py b/fre/app/generate_time_averages/generate_time_averages.py index 2d0f0a874..a69020cc8 100755 --- a/fre/app/generate_time_averages/generate_time_averages.py +++ b/fre/app/generate_time_averages/generate_time_averages.py @@ -3,16 +3,19 @@ import os import logging import time +import warnings from typing import Optional, List, Union -from cdo import Cdo +import xarray as xr -from .cdoTimeAverager import cdoTimeAverager from .frenctoolsTimeAverager import frenctoolsTimeAverager -from .frepytoolsTimeAverager import frepytoolsTimeAverager +from .frepytoolsTimeAverager import NumpyTimeAverager +from .xarrayTimeAverager import xarrayTimeAverager fre_logger = logging.getLogger(__name__) +VALID_PKGS = ['cdo', 'fre-nctools', 'fre-python-tools', 'xarray'] + def generate_time_average(infile: Union[str, List[str]] = None, outfile: str = None, pkg: str = None, @@ -26,7 +29,9 @@ def generate_time_average(infile: Union[str, List[str]] = None, :type infile: str, list :param outfile: path to where output file should be stored :type outfile: str - :param pkg: which package to use to calculate climatology (cdo, fre-nctools, fre-python-tools) + :param pkg: which package to use to calculate climatology + ('xarray', 'fre-python-tools', 'fre-nctools', or 'cdo') + 'cdo' is kept for backward compat but silently uses xarray. :type pkg: str :param var: optional, not currently supported and defaults to None :type var: str @@ -41,12 +46,12 @@ def generate_time_average(infile: Union[str, List[str]] = None, fre_logger.debug('called generate_time_average') if None in [infile, outfile, pkg]: raise ValueError('infile, outfile, and pkg are required inputs') - if pkg not in ['cdo', 'fre-nctools', 'fre-python-tools']: - raise ValueError(f'argument pkg = {pkg} not known, must be one of: cdo, fre-nctools, fre-python-tools') + if pkg not in VALID_PKGS: + raise ValueError(f'argument pkg = {pkg} not known, must be one of: {", ".join(VALID_PKGS)}') exitstatus = 1 myavger = None - # multiple files case Use cdo to merge multiple files if present + # multiple files case - merge multiple files if present merged = False orig_infile_list = None if all ( [ type(infile).__name__ == 'list', @@ -54,13 +59,13 @@ def generate_time_average(infile: Union[str, List[str]] = None, fre_logger.info('list input argument detected') infile_str = [str(item) for item in infile] - _cdo = Cdo() merged_file = "merged_output.nc" - fre_logger.info('calling cdo mergetime') + fre_logger.info('merging input files with xarray') fre_logger.debug('output: %s', merged_file) fre_logger.debug('inputs: \n %s', ' '.join(infile_str) ) - _cdo.mergetime(input = ' '.join(infile_str), output = merged_file) + with xr.open_mfdataset(infile_str, combine='by_coords') as ds: + ds.to_netcdf(merged_file) # preserve the original file names for later orig_infile_list = infile @@ -69,11 +74,27 @@ def generate_time_average(infile: Union[str, List[str]] = None, fre_logger.info('file merging success') if pkg == 'cdo': - fre_logger.info('creating a cdoTimeAverager') - myavger = cdoTimeAverager( pkg = pkg, - var = var, - unwgt = unwgt, - avg_type = avg_type ) + # CDO has been removed — warn loudly, use xarray instead + msg = ( + "WARNING *** CDO/python-cdo has been REMOVED from fre-cli. " + "pkg='cdo' now uses the xarray time-averager under the hood. " + "Please switch to pkg='xarray' or pkg='fre-python-tools'. ***" + ) + warnings.warn(msg, FutureWarning, stacklevel=2) + fre_logger.warning(msg) + fre_logger.info('creating an xarrayTimeAverager (via pkg=cdo redirect)') + myavger = xarrayTimeAverager( pkg = pkg, + var = var, + unwgt = unwgt, + avg_type = avg_type ) + + elif pkg == 'xarray': + fre_logger.info('creating an xarrayTimeAverager') + myavger = xarrayTimeAverager( pkg = pkg, + var = var, + unwgt = unwgt, + avg_type = avg_type ) + elif pkg == 'fre-nctools': fre_logger.info('creating a frenctoolsTimeAverager') myavger = frenctoolsTimeAverager( pkg = pkg, @@ -87,11 +108,11 @@ def generate_time_average(infile: Union[str, List[str]] = None, var = orig_infile_list[0].split('/').pop().split('.')[-2] fre_logger.warning('extracted var = %s from orig_infile_list[0] = %s', var, orig_infile_list[0] ) - fre_logger.info('creating a frepytoolsTimeAverager') - myavger = frepytoolsTimeAverager( pkg = pkg, - var = var, - unwgt = unwgt, - avg_type = avg_type ) + fre_logger.info('creating a NumpyTimeAverager') + myavger = NumpyTimeAverager( pkg = pkg, + var = var, + unwgt = unwgt, + avg_type = avg_type ) # workload if myavger is not None: diff --git a/fre/app/generate_time_averages/tests/test_cdoTimeAverager.py b/fre/app/generate_time_averages/tests/test_cdoTimeAverager.py index 9ebbc4937..2ed4c46db 100644 --- a/fre/app/generate_time_averages/tests/test_cdoTimeAverager.py +++ b/fre/app/generate_time_averages/tests/test_cdoTimeAverager.py @@ -10,3 +10,13 @@ def test_cdotimavg_init_error(): test_avgr = cdo_timavg.cdoTimeAverager(pkg = 'cdo', var = None, unwgt = False, avg_type = 'FOO') test_avgr.generate_timavg(infile = None, outfile = None) + +def test_cdotimavg_warns_future(): + ''' test that FutureWarning is emitted when generate_timavg is called ''' + with pytest.warns(FutureWarning, match='CDO/python-cdo has been REMOVED'): + test_avgr = cdo_timavg.cdoTimeAverager(pkg = 'cdo', var = None, unwgt = False, avg_type = 'all') + # this will fail because infile is None, but the warning should fire first + try: + test_avgr.generate_timavg(infile = 'nonexistent.nc', outfile = 'out.nc') + except (FileNotFoundError, ValueError, OSError): + pass diff --git a/fre/app/generate_time_averages/tests/test_generate_time_averages.py b/fre/app/generate_time_averages/tests/test_generate_time_averages.py index 2f7209b94..5ffde0598 100644 --- a/fre/app/generate_time_averages/tests/test_generate_time_averages.py +++ b/fre/app/generate_time_averages/tests/test_generate_time_averages.py @@ -94,21 +94,22 @@ def test_time_avg_file_dir_exists(): FULL_TEST_FILE_PATH = TIME_AVG_FILE_DIR + TEST_FILE_NAME cases=[ - #cdo cases, monthly, one/multiple files, weighted + # cdo cases — CDO removed, but entry point still works (redirects to xarray) + # monthly, one/multiple files, weighted pytest.param( 'cdo', 'month', True , FULL_TEST_FILE_PATH, TIME_AVG_FILE_DIR + 'ymonmean_unwgt_' + TEST_FILE_NAME), pytest.param( 'cdo', 'month', True , TWO_TEST_FILE_NAMES, TIME_AVG_FILE_DIR + 'ymonmean_unwgt_' + TWO_OUT_FILE_NAME), - #cdo cases, seasonal, one/multiple files, unweighted + # seasonal, one/multiple files, unweighted pytest.param( 'cdo', 'seas', True , FULL_TEST_FILE_PATH, TIME_AVG_FILE_DIR + 'yseasmean_unwgt_' + TEST_FILE_NAME), pytest.param( 'cdo', 'seas', True , TWO_TEST_FILE_NAMES, TIME_AVG_FILE_DIR + 'yseasmean_unwgt_' + TWO_OUT_FILE_NAME), - #cdo cases, all, one/multiple files, weighted/unweighted + # all, one/multiple files, weighted/unweighted pytest.param( 'cdo', 'all', True , FULL_TEST_FILE_PATH, STR_UNWGT_CDO_INF), @@ -121,6 +122,43 @@ def test_time_avg_file_dir_exists(): pytest.param( 'cdo', 'all', False , TWO_TEST_FILE_NAMES, TIME_AVG_FILE_DIR + 'timmean_' + TWO_OUT_FILE_NAME), + # xarray cases — all avg_types, single and multi-file + pytest.param( 'xarray', 'all', True , + FULL_TEST_FILE_PATH, + TIME_AVG_FILE_DIR + 'xarray_unwgt_timavg_' + TEST_FILE_NAME), + pytest.param( 'xarray', 'all', False , + FULL_TEST_FILE_PATH, + TIME_AVG_FILE_DIR + 'xarray_wgt_timavg_' + TEST_FILE_NAME), + pytest.param( 'xarray', 'all', True , + TWO_TEST_FILE_NAMES, + TIME_AVG_FILE_DIR + 'xarray_unwgt_timavg_' + TWO_OUT_FILE_NAME), + pytest.param( 'xarray', 'all', False , + TWO_TEST_FILE_NAMES, + TIME_AVG_FILE_DIR + 'xarray_wgt_timavg_' + TWO_OUT_FILE_NAME), + pytest.param( 'xarray', 'seas', True , + FULL_TEST_FILE_PATH, + TIME_AVG_FILE_DIR + 'xarray_seas_unwgt_' + TEST_FILE_NAME), + pytest.param( 'xarray', 'seas', False , + FULL_TEST_FILE_PATH, + TIME_AVG_FILE_DIR + 'xarray_seas_wgt_' + TEST_FILE_NAME), + pytest.param( 'xarray', 'month', True , + FULL_TEST_FILE_PATH, + TIME_AVG_FILE_DIR + 'xarray_month_unwgt_' + TEST_FILE_NAME), + pytest.param( 'xarray', 'month', False , + FULL_TEST_FILE_PATH, + TIME_AVG_FILE_DIR + 'xarray_month_wgt_' + TEST_FILE_NAME), + pytest.param( 'xarray', 'seas', True , + TWO_TEST_FILE_NAMES, + TIME_AVG_FILE_DIR + 'xarray_seas_unwgt_' + TWO_OUT_FILE_NAME), + pytest.param( 'xarray', 'seas', False , + TWO_TEST_FILE_NAMES, + TIME_AVG_FILE_DIR + 'xarray_seas_wgt_' + TWO_OUT_FILE_NAME), + pytest.param( 'xarray', 'month', True , + TWO_TEST_FILE_NAMES, + TIME_AVG_FILE_DIR + 'xarray_month_unwgt_' + TWO_OUT_FILE_NAME), + pytest.param( 'xarray', 'month', False , + TWO_TEST_FILE_NAMES, + TIME_AVG_FILE_DIR + 'xarray_month_wgt_' + TWO_OUT_FILE_NAME), #fre-python-tools cases, all, one/multiple files, weighted/unweighted flag pytest.param( 'fre-python-tools', 'all', False , FULL_TEST_FILE_PATH, @@ -243,10 +281,10 @@ def test_compare_fre_cli_to_fre_nctools(): assert not( (non_zero_count > 0.) or (non_zero_count < 0.) ), "non-zero diffs between frepy / frenctools were found" -@pytest.mark.xfail(reason = 'test fails b.c. cdo cannot bitwise-reproduce fre-nctools answer') +@pytest.mark.xfail(reason = 'cdo entry-point now uses xarray — result format differs from old CDO output') def test_compare_fre_cli_to_cdo(): ''' - compares fre_cli pkg answer to cdo pkg answer + compares fre_cli pkg answer to cdo pkg answer (cdo now redirects to xarray) ''' assert Path(STR_FRE_PYTOOLS_INF).exists(), f'DNE: STR_FRE_PYTOOLS_INF = {STR_FRE_PYTOOLS_INF}' fre_pytools_inf = Dataset(STR_FRE_PYTOOLS_INF, 'r') @@ -273,6 +311,7 @@ def test_compare_fre_cli_to_cdo(): assert not( (non_zero_count > 0.) or (non_zero_count < 0.) ), "non-zero diffs between cdo / frepytools were found" +@pytest.mark.xfail(reason = 'cdo entry-point now uses xarray — result format differs from old CDO output') def test_compare_unwgt_fre_cli_to_unwgt_cdo(): ''' compares fre_cli pkg answer to cdo pkg answer @@ -301,10 +340,10 @@ def test_compare_unwgt_fre_cli_to_unwgt_cdo(): non_zero_count = np.count_nonzero(diff_pytools_cdo_timavg[:]) assert not( (non_zero_count > 0.) or (non_zero_count < 0.) ), "non-zero diffs between cdo / frepytools were found" -@pytest.mark.xfail(reason = 'test fails b.c. cdo cannot bitwise-reproduce fre-nctools answer') +@pytest.mark.xfail(reason = 'cdo entry-point now uses xarray — result format differs from old CDO output') def test_compare_cdo_to_fre_nctools(): ''' - compares cdo pkg answer to fre_nctools pkg answer + compares cdo pkg answer to fre_nctools pkg answer (cdo now redirects to xarray) ''' assert Path(STR_FRENCTOOLS_INF).exists(), f'DNE: STR_FRENCTOOLS_INF = {STR_FRENCTOOLS_INF}' diff --git a/fre/app/generate_time_averages/tests/test_wrapper.py b/fre/app/generate_time_averages/tests/test_wrapper.py index bec909464..e815998a1 100644 --- a/fre/app/generate_time_averages/tests/test_wrapper.py +++ b/fre/app/generate_time_averages/tests/test_wrapper.py @@ -170,7 +170,7 @@ def test_monthly_av_from_monthly_ts(create_monthly_timeseries): assert file_.exists() -# CDO-based tests +# CDO-based tests — CDO has been removed, entry point redirects to xarray def test_cdo_annual_av_from_monthly_ts(create_monthly_timeseries): """ Generate annual average from monthly timeseries using CDO @@ -206,7 +206,7 @@ def test_cdo_annual_av_from_monthly_ts(create_monthly_timeseries): def test_cdo_annual_av_from_annual_ts(create_annual_timeseries): """ - Generate annual average from annual timeseries using CDO + Generate annual average from annual timeseries using CDO (redirects to xarray) """ cycle_point = '0002-01-01' output_interval = 'P2Y' @@ -239,7 +239,7 @@ def test_cdo_annual_av_from_annual_ts(create_annual_timeseries): def test_cdo_monthly_av_from_monthly_ts(create_monthly_timeseries): """ - Generate monthly climatology from monthly timeseries using CDO + Generate monthly climatology from monthly timeseries using CDO (redirects to xarray) """ cycle_point = '1980-01-01' output_interval = 'P2Y' @@ -275,8 +275,7 @@ def test_cdo_monthly_av_from_monthly_ts(create_monthly_timeseries): @pytest.mark.xfail(reason="no timavg.csh") def test_cdo_fre_nctools_equivalence(create_monthly_timeseries): """ - Test that CDO produces equivalent results to fre-nctools when timavg.csh is available. - If timavg.csh is not available, the test will be skipped. + Test that CDO (now xarray) produces equivalent results to fre-nctools when timavg.csh is available. """ cycle_point = '1980-01-01' @@ -357,3 +356,80 @@ def test_freq_not_valid_valueerror(): output_interval = 'P999Y', frequency = 'FOO', grid = 'BAR') + + +# xarray-based wrapper tests +def test_xarray_annual_av_from_monthly_ts(create_monthly_timeseries): + """ + Generate annual average from monthly timeseries using xarray + """ + cycle_point = '1980-01-01' + output_interval = 'P2Y' + input_interval = 'P1Y' + grid = '180_288.conserve_order2' + sources = ['atmos_month'] + frequency = 'yr' + pkg = 'xarray' + + wrapper.generate_wrapper(cycle_point, str(create_monthly_timeseries), sources, + output_interval, input_interval, grid, frequency, pkg) + + output_dir = Path(create_monthly_timeseries, 'av', grid, 'atmos_month', 'P1Y', output_interval) + output_files = [ + output_dir / 'atmos_month.1980-1981.alb_sfc.nc', + output_dir / 'atmos_month.1980-1981.aliq.nc' + ] + + for file_ in output_files: + assert file_.exists() + + +def test_xarray_annual_av_from_annual_ts(create_annual_timeseries): + """ + Generate annual average from annual timeseries using xarray + """ + cycle_point = '0002-01-01' + output_interval = 'P2Y' + input_interval = 'P1Y' + grid = '180_288.conserve_order1' + sources = ['tracer_level'] + frequency = 'yr' + pkg = 'xarray' + + wrapper.generate_wrapper(cycle_point, str(create_annual_timeseries), sources, + output_interval, input_interval, grid, frequency, pkg) + + output_dir = Path(create_annual_timeseries, 'av', grid, 'tracer_level', 'P1Y', output_interval) + output_files = [ + output_dir / 'tracer_level.0002-0003.radon.nc', + output_dir / 'tracer_level.0002-0003.scale_salt_emis.nc' + ] + + for file_ in output_files: + assert file_.exists() + + +def test_xarray_monthly_av_from_monthly_ts(create_monthly_timeseries): + """ + Generate monthly climatology from monthly timeseries using xarray + """ + cycle_point = '1980-01-01' + output_interval = 'P2Y' + input_interval = 'P1Y' + grid = '180_288.conserve_order2' + sources = ['atmos_month'] + frequency = 'mon' + pkg = 'xarray' + + wrapper.generate_wrapper(cycle_point, str(create_monthly_timeseries), sources, + output_interval, input_interval, grid, frequency, pkg) + + output_dir = Path(create_monthly_timeseries, 'av', grid, 'atmos_month', 'P1M', output_interval) + output_files = [ + output_dir / 'atmos_month.1980-1981.alb_sfc', + output_dir / 'atmos_month.1980-1981.aliq', + ] + for f in output_files: + for i in range(1,13): + file_ = Path(str(f) + f".{i:02d}.nc") + assert file_.exists() diff --git a/fre/app/generate_time_averages/tests/test_xarrayTimeAverager.py b/fre/app/generate_time_averages/tests/test_xarrayTimeAverager.py new file mode 100644 index 000000000..d8e0efbc7 --- /dev/null +++ b/fre/app/generate_time_averages/tests/test_xarrayTimeAverager.py @@ -0,0 +1,582 @@ +''' +Comprehensive unit tests for xarrayTimeAverager and its helper functions. + +Tests cover: + - _is_numeric() — dtype classification helper + - _compute_time_weights() — time-weight extraction from time_bnds + - _weighted_time_mean() — correctness of weighted global mean + - _weighted_seasonal_mean() — correctness of weighted seasonal groupby + - _weighted_monthly_mean() — correctness of weighted monthly groupby + - xarrayTimeAverager.generate_timavg() — full round-trip via NetCDF files +''' + +from pathlib import Path + +import numpy as np +import pandas as pd +import pytest +import xarray as xr + +from fre.app.generate_time_averages.xarrayTimeAverager import ( + _compute_time_weights, + _is_numeric, + _weighted_monthly_mean, + _weighted_seasonal_mean, + _weighted_time_mean, + xarrayTimeAverager, +) + +# --------------------------------------------------------------------------- +# Dataset factory helpers +# --------------------------------------------------------------------------- + +# Days per month for a standard (non-leap) year, January → December +_DAYS_IN_MONTH = np.array([31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31], + dtype='float64') + + +def _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float'): + ''' + Build a minimal monthly xr.Dataset with one numeric variable ('temp'). + + The ``time`` coordinate uses actual ``datetime64`` values so that the + ``.dt`` accessor (needed for ``.dt.season``, ``.dt.month``) works + on the in-memory dataset without requiring a NetCDF round-trip. + + Parameters + ---------- + n_years : int + How many full calendar years to include (12*n_years timesteps). + with_bnds : bool + Whether to include a ``time_bnds`` variable. + time_bnds_encoding : str + How to encode ``time_bnds``: + - ``'float'`` → plain float64 day-counts (numeric path) + - ``'timedelta'`` → numpy datetime64 edges (timedelta64 diff path) + ''' + n_months = 12 * n_years + + # Month-start dates (Jan 1, Feb 1, …) anchored at 2001 (non-leap year) + month_starts = pd.date_range('2001-01-01', periods=n_months, freq='MS') + # Month midpoints (15th of each month) used as the time coordinate + times = month_starts + pd.Timedelta(days=15) + # Month lengths in days for the non-leap year 2001 + # Use pd.DateOffset to compute exact month lengths from the actual dates + month_ends = month_starts + pd.offsets.MonthEnd(1) + pd.Timedelta(days=1) + days = np.array([(e - s).days for s, e in zip(month_starts, month_ends)], + dtype='float64') + + # values: month-number (1 … 12) cycling over years, stored as float32 + values = np.tile(np.arange(1, 13, dtype='float32'), n_years).reshape(n_months, 1, 1) + + data_vars = { + 'temp': xr.DataArray(values, dims=['time', 'lat', 'lon'], + attrs={'units': 'K', 'long_name': 'temperature'}), + } + + if with_bnds: + if time_bnds_encoding == 'float': + # plain float64 day-counts (numeric path in _compute_time_weights) + t0 = np.concatenate([[0.0], np.cumsum(days[:-1])]) + t1 = np.cumsum(days) + bnds_vals = np.stack([t0, t1], axis=1) + data_vars['time_bnds'] = xr.DataArray(bnds_vals, dims=['time', 'bnds']) + elif time_bnds_encoding == 'timedelta': + # datetime64 edges → difference gives timedelta64 (timedelta path) + starts = month_starts.values.astype('datetime64[D]') + ends = month_ends.values.astype('datetime64[D]') + bnds_vals = np.stack([starts, ends], axis=1) + data_vars['time_bnds'] = xr.DataArray(bnds_vals, dims=['time', 'bnds']) + + return xr.Dataset(data_vars, coords={'time': times}), days + + +def _make_ds_with_nonnumeric_var(): + ''' + Dataset that contains a datetime64 variable alongside a numeric one. + Simulates the ``average_T1``/``average_T2`` pattern from GFDL atmos files. + ''' + n = 4 + month_starts = pd.date_range('2001-01-01', periods=n, freq='MS') + times = month_starts + pd.Timedelta(days=15) + dt_vals = month_starts.values + + return xr.Dataset( + { + 'temp': xr.DataArray([1.0, 2.0, 3.0, 4.0], dims=['time'], + attrs={'units': 'K'}), + 'start_time': xr.DataArray(dt_vals, dims=['time']), # datetime64 — non-numeric + 'time_bnds': xr.DataArray( + np.stack([np.arange(n, dtype='float64'), + np.arange(1, n+1, dtype='float64')], axis=1), + dims=['time', 'bnds']), + }, + coords={'time': times}, + ) + + +# --------------------------------------------------------------------------- +# Tests for _is_numeric() +# --------------------------------------------------------------------------- + +class TestIsNumeric: + '''Unit tests for the _is_numeric() helper.''' + + def test_float32(self): + da = xr.DataArray(np.array([1.0], dtype='float32')) + assert _is_numeric(da) + + def test_float64(self): + da = xr.DataArray(np.array([1.0], dtype='float64')) + assert _is_numeric(da) + + def test_int32(self): + da = xr.DataArray(np.array([1], dtype='int32')) + assert _is_numeric(da) + + def test_int64(self): + da = xr.DataArray(np.array([1], dtype='int64')) + assert _is_numeric(da) + + def test_uint8(self): + da = xr.DataArray(np.array([1], dtype='uint8')) + assert _is_numeric(da) + + def test_datetime64_is_not_numeric(self): + da = xr.DataArray(np.array(['2000-01-01'], dtype='datetime64[D]')) + assert not _is_numeric(da) + + def test_timedelta64_is_not_numeric(self): + da = xr.DataArray(np.array([1], dtype='timedelta64[D]')) + assert not _is_numeric(da) + + def test_object_is_not_numeric(self): + da = xr.DataArray(np.array(['hello'], dtype=object)) + assert not _is_numeric(da) + + +# --------------------------------------------------------------------------- +# Tests for _compute_time_weights() +# --------------------------------------------------------------------------- + +class TestComputeTimeWeights: + '''Unit tests for the _compute_time_weights() helper.''' + + def test_float_bnds_returns_correct_days(self): + ds, days = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + weights = _compute_time_weights(ds) + assert weights.dtype == np.float64 + np.testing.assert_allclose(weights.values, days) + + def test_timedelta_bnds_returns_correct_days(self): + '''time_bnds stored as datetime64 edges → difference is timedelta64.''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='timedelta') + weights = _compute_time_weights(ds) + assert weights.dtype == np.float64 + # compute expected days from the actual bounds stored in the dataset + bnds = ds['time_bnds'].values + expected = (bnds[:, 1] - bnds[:, 0]).astype('timedelta64[D]').astype('float64') + np.testing.assert_allclose(weights.values, expected, atol=1e-9) + + def test_no_bnds_fallback_uniform_weights(self): + '''Without time_bnds, weights should all equal 1.0.''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=False) + weights = _compute_time_weights(ds) + assert weights.dtype == np.float64 + assert len(weights) == 12 + np.testing.assert_array_equal(weights.values, np.ones(12)) + + def test_weights_dim_is_time(self): + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + weights = _compute_time_weights(ds) + assert 'time' in weights.dims + + def test_two_timestep_known_values(self): + '''Minimal two-timestep dataset with explicit bounds: Jan(31d) + Feb(28d).''' + ds = xr.Dataset({ + 'temp': xr.DataArray([1.0, 2.0], dims=['time']), + 'time_bnds': xr.DataArray([[0.0, 31.0], [31.0, 59.0]], dims=['time', 'bnds']), + }) + weights = _compute_time_weights(ds) + np.testing.assert_allclose(weights.values, [31.0, 28.0]) + + +# --------------------------------------------------------------------------- +# Tests for _weighted_time_mean() +# --------------------------------------------------------------------------- + +class TestWeightedTimeMean: + '''Unit tests for _weighted_time_mean() correctness.''' + + def test_two_timestep_known_weighted_mean(self): + ''' + Jan(31d)=1.0, Feb(28d)=2.0 → weighted mean = (1*31 + 2*28) / (31+28). + ''' + ds = xr.Dataset({ + 'temp': xr.DataArray([1.0, 2.0], dims=['time']), + 'time_bnds': xr.DataArray([[0.0, 31.0], [31.0, 59.0]], dims=['time', 'bnds']), + }) + result = _weighted_time_mean(ds) + expected = (1.0 * 31 + 2.0 * 28) / (31 + 28) + np.testing.assert_allclose(float(result['temp']), expected, rtol=1e-6) + + def test_uniform_weights_equal_arithmetic_mean(self): + '''When all timesteps have equal weight, weighted = unweighted mean.''' + vals = np.arange(1.0, 5.0) + ds = xr.Dataset({ + 'temp': xr.DataArray(vals, dims=['time']), + 'time_bnds': xr.DataArray([[float(i), float(i+1)] for i in range(4)], + dims=['time', 'bnds']), + }) + result = _weighted_time_mean(ds) + np.testing.assert_allclose(float(result['temp']), vals.mean(), rtol=1e-6) + + def test_time_dim_eliminated(self): + '''Output should have no time dimension.''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_time_mean(ds) + assert 'time' not in result['temp'].dims + + def test_non_time_vars_preserved(self): + '''Variables without time dimension are passed through unchanged.''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + # lat dim in _make_monthly_ds has size 1; match that size here + ds = ds.assign({'static': xr.DataArray([42.0], dims=['lat'])}) + result = _weighted_time_mean(ds) + assert 'static' in result + np.testing.assert_array_equal(result['static'].values, [42.0]) + + def test_time_bnds_excluded_from_output(self): + '''time_bnds should not appear in the weighted mean output.''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_time_mean(ds) + assert 'time_bnds' not in result + + def test_nonnumeric_time_var_gets_first_value(self): + '''datetime64 time-dependent variables get the value from timestep 0.''' + ds = _make_ds_with_nonnumeric_var() + result = _weighted_time_mean(ds) + # 'start_time' is datetime64 → should be scalar == ds['start_time'].isel(time=0) + assert 'start_time' in result + assert result['start_time'].values == ds['start_time'].values[0] + + def test_attrs_preserved(self): + '''Dataset and variable attributes should be preserved.''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_time_mean(ds) + assert result['temp'].attrs.get('units') == 'K' + + +# --------------------------------------------------------------------------- +# Tests for _weighted_seasonal_mean() +# --------------------------------------------------------------------------- + +class TestWeightedSeasonalMean: + '''Unit tests for _weighted_seasonal_mean() correctness.''' + + def test_season_dim_present(self): + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_seasonal_mean(ds) + assert 'season' in result['temp'].dims + + def test_four_seasons_present(self): + '''A full year should produce all four seasons in the output.''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_seasonal_mean(ds) + seasons = set(result['season'].values) + assert seasons == {'DJF', 'MAM', 'JJA', 'SON'} + + def test_time_bnds_excluded(self): + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_seasonal_mean(ds) + assert 'time_bnds' not in result + + def test_mam_weighted_value(self): + ''' + MAM (Mar=31, Apr=30, May=31) with values (3,4,5): + weighted mean = (3*31 + 4*30 + 5*31) / (31+30+31) = 368/92 ≈ 4.0 + ''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_seasonal_mean(ds) + mam_val = float(result['temp'].sel(season='MAM').values.flat[0]) + np.testing.assert_allclose(mam_val, 368.0 / 92.0, rtol=1e-4) + + def test_jja_weighted_value(self): + ''' + JJA (Jun=30, Jul=31, Aug=31) with values (6,7,8): + weighted mean = (6*30 + 7*31 + 8*31) / (30+31+31) = 645/92 + ''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_seasonal_mean(ds) + jja_val = float(result['temp'].sel(season='JJA').values.flat[0]) + np.testing.assert_allclose(jja_val, 645.0 / 92.0, rtol=1e-4) + + def test_son_weighted_value(self): + ''' + SON (Sep=30, Oct=31, Nov=30) with values (9,10,11): + weighted mean = (9*30 + 10*31 + 11*30) / (30+31+30) = 910/91 = 10.0 + ''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_seasonal_mean(ds) + son_val = float(result['temp'].sel(season='SON').values.flat[0]) + np.testing.assert_allclose(son_val, 910.0 / 91.0, rtol=1e-4) + + def test_nonnumeric_vars_excluded_from_time_groupby(self): + '''Non-numeric time-dependent variables should not appear in the output.''' + ds = _make_ds_with_nonnumeric_var() + result = _weighted_seasonal_mean(ds) + # 'temp' (float) should be present; 'start_time' (datetime64) should be excluded + assert 'temp' in result + assert 'start_time' not in result + + +# --------------------------------------------------------------------------- +# Tests for _weighted_monthly_mean() +# --------------------------------------------------------------------------- + +class TestWeightedMonthlyMean: + '''Unit tests for _weighted_monthly_mean() correctness.''' + + def test_month_dim_present(self): + '''groupby(time.month) produces a "month" coordinate dimension.''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_monthly_mean(ds) + assert 'month' in result['temp'].dims + + def test_twelve_months_present(self): + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_monthly_mean(ds) + # groupby('time.month') produces a 'time' coordinate with 12 values + assert result['temp'].shape[0] == 12 + + def test_single_year_weighted_equals_unweighted(self): + ''' + With only one year of data, each month appears exactly once so + weighted and unweighted averages are identical. + ''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + weighted = _weighted_monthly_mean(ds) + unweighted = ds.groupby('time.month').mean(dim='time', keep_attrs=True) + np.testing.assert_allclose( + weighted['temp'].values, + unweighted['temp'].values, + rtol=1e-5, + ) + + def test_two_year_weighted_jan_mean(self): + ''' + With two years, January appears twice with identical weights (31 days + both years) and values 1.0 (year 1) and 1.0 (year 2) → mean = 1.0. + ''' + ds, _ = _make_monthly_ds(n_years=2, with_bnds=True, time_bnds_encoding='float') + result = _weighted_monthly_mean(ds) + # groupby(time.month) produces a 'month' coordinate; month=1 is January + jan_val = float(result['temp'].sel(month=1).values.flat[0]) + np.testing.assert_allclose(jan_val, 1.0, rtol=1e-5) + + def test_time_bnds_excluded(self): + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + result = _weighted_monthly_mean(ds) + assert 'time_bnds' not in result + + def test_nonnumeric_vars_excluded_from_time_groupby(self): + ds = _make_ds_with_nonnumeric_var() + result = _weighted_monthly_mean(ds) + assert 'temp' in result + assert 'start_time' not in result + + +# --------------------------------------------------------------------------- +# Integration tests for xarrayTimeAverager.generate_timavg() +# --------------------------------------------------------------------------- + +class TestXarrayTimeAveragerGenerateTimavg: + ''' + Integration tests that write a synthetic NetCDF to a temp dir, + run generate_timavg(), and verify the outputs. + ''' + + @pytest.fixture + def monthly_nc(self, tmp_path): + '''Write a 1-year monthly dataset to a temp NetCDF file.''' + ds, _ = _make_monthly_ds(n_years=1, with_bnds=True, time_bnds_encoding='float') + nc_path = tmp_path / 'monthly.nc' + ds.to_netcdf(nc_path) + return nc_path + + @pytest.fixture + def two_year_nc(self, tmp_path): + '''Write a 2-year monthly dataset to a temp NetCDF file.''' + ds, _ = _make_monthly_ds(n_years=2, with_bnds=True, time_bnds_encoding='float') + nc_path = tmp_path / 'monthly_2yr.nc' + ds.to_netcdf(nc_path) + return nc_path + + # ---- error path ---- + + def test_invalid_avg_type_raises_valueerror(self, monthly_nc, tmp_path): + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='bogus') + with pytest.raises(ValueError, match='unknown avg_type'): + avgr.generate_timavg(infile=str(monthly_nc), + outfile=str(tmp_path / 'out.nc')) + + # ---- avg_type='all' ---- + + def test_all_unwgt_output_exists(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_all_unwgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='all') + ret = avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + assert ret == 0 + assert outfile.exists() + + def test_all_unwgt_no_time_dim(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_all_unwgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='all') + avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + result = xr.open_dataset(outfile) + assert 'time' not in result['temp'].dims + + def test_all_unwgt_returns_zero(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='all') + assert avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) == 0 + + def test_all_wgt_output_exists(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_all_wgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=False, avg_type='all') + ret = avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + assert ret == 0 + assert outfile.exists() + + def test_all_wgt_no_time_dim(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_all_wgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=False, avg_type='all') + avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + result = xr.open_dataset(outfile) + assert 'time' not in result['temp'].dims + + def test_all_wgt_differs_from_unwgt_for_unequal_months(self, monthly_nc, tmp_path): + '''Weighted and unweighted global means should differ for unequal month lengths.''' + out_wgt = tmp_path / 'wgt.nc' + out_unwgt = tmp_path / 'unwgt.nc' + xarrayTimeAverager(pkg='xarray', var=None, unwgt=False, avg_type='all').generate_timavg( + infile=str(monthly_nc), outfile=str(out_wgt)) + xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='all').generate_timavg( + infile=str(monthly_nc), outfile=str(out_unwgt)) + wgt_val = float(xr.open_dataset(out_wgt)['temp'].values.flat[0]) + unwgt_val = float(xr.open_dataset(out_unwgt)['temp'].values.flat[0]) + assert wgt_val != pytest.approx(unwgt_val, rel=1e-4) + + def test_all_unwgt_correct_arithmetic_mean(self, monthly_nc, tmp_path): + '''Unweighted mean of values 1..12 should equal 6.5.''' + outfile = tmp_path / 'out.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='all') + avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + result_val = float(xr.open_dataset(outfile)['temp'].values.flat[0]) + np.testing.assert_allclose(result_val, 6.5, rtol=1e-5) + + # ---- avg_type='seas' ---- + + def test_seas_unwgt_output_exists(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_seas_unwgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='seas') + ret = avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + assert ret == 0 + assert outfile.exists() + + def test_seas_unwgt_has_season_dim(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_seas_unwgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='seas') + avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + result = xr.open_dataset(outfile) + assert 'season' in result['temp'].dims + + def test_seas_unwgt_four_seasons(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_seas_unwgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='seas') + avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + result = xr.open_dataset(outfile) + seasons = set(result['season'].values.tolist()) + assert seasons == {'DJF', 'MAM', 'JJA', 'SON'} + + def test_seas_wgt_output_exists(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_seas_wgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=False, avg_type='seas') + ret = avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + assert ret == 0 + assert outfile.exists() + + def test_seas_wgt_has_season_dim(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_seas_wgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=False, avg_type='seas') + avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + result = xr.open_dataset(outfile) + assert 'season' in result['temp'].dims + + def test_seas_wgt_mam_value(self, monthly_nc, tmp_path): + ''' + MAM = (3*31 + 4*30 + 5*31) / (31+30+31) = 368/92 ≈ 4.0. + Read back from file and verify. + ''' + outfile = tmp_path / 'out_seas_wgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=False, avg_type='seas') + avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + result = xr.open_dataset(outfile) + mam_val = float(result['temp'].sel(season='MAM').values.flat[0]) + np.testing.assert_allclose(mam_val, 368.0 / 92.0, rtol=1e-4) + + # ---- avg_type='month' ---- + + def test_month_unwgt_output_exists(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_month_unwgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='month') + ret = avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + assert ret == 0 + assert outfile.exists() + + def test_month_unwgt_per_month_files_created(self, monthly_nc, tmp_path): + '''generate_timavg should write 12 per-month files named *.01.nc … *.12.nc.''' + outfile = tmp_path / 'out_month_unwgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='month') + avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + for m in range(1, 13): + month_file = tmp_path / f'out_month_unwgt.{m:02d}.nc' + assert month_file.exists(), f'missing per-month file: {month_file}' + + def test_month_wgt_output_exists(self, monthly_nc, tmp_path): + outfile = tmp_path / 'out_month_wgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=False, avg_type='month') + ret = avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + assert ret == 0 + assert outfile.exists() + + def test_month_wgt_per_month_files_created(self, monthly_nc, tmp_path): + '''generate_timavg with unwgt=False should still write 12 per-month files.''' + outfile = tmp_path / 'out_month_wgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=False, avg_type='month') + avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + for m in range(1, 13): + month_file = tmp_path / f'out_month_wgt.{m:02d}.nc' + assert month_file.exists(), f'missing per-month file: {month_file}' + + def test_month_wgt_jan_file_correct_value(self, two_year_nc, tmp_path): + ''' + With 2 years of identical January data (value=1.0, weight=31d both years), + the weighted January average should be 1.0. + ''' + outfile = tmp_path / 'out_month_wgt.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=False, avg_type='month') + avgr.generate_timavg(infile=str(two_year_nc), outfile=str(outfile)) + jan_file = tmp_path / 'out_month_wgt.01.nc' + result = xr.open_dataset(jan_file) + jan_val = float(result['temp'].values.flat[0]) + np.testing.assert_allclose(jan_val, 1.0, rtol=1e-5) + + def test_infile_not_modified(self, monthly_nc, tmp_path): + '''The input file must not be deleted or overwritten by the averager.''' + size_before = monthly_nc.stat().st_size + outfile = tmp_path / 'out.nc' + avgr = xarrayTimeAverager(pkg='xarray', var=None, unwgt=True, avg_type='all') + avgr.generate_timavg(infile=str(monthly_nc), outfile=str(outfile)) + assert monthly_nc.exists() + assert monthly_nc.stat().st_size == size_before diff --git a/fre/app/generate_time_averages/xarrayTimeAverager.py b/fre/app/generate_time_averages/xarrayTimeAverager.py new file mode 100644 index 000000000..4ba6a9ecf --- /dev/null +++ b/fre/app/generate_time_averages/xarrayTimeAverager.py @@ -0,0 +1,213 @@ +''' class using xarray for time-averages and climatology generation ''' + +import logging + +import numpy as np +import xarray as xr + +from .timeAverager import timeAverager + +fre_logger = logging.getLogger(__name__) + +# dtypes eligible for weighted averaging +_NUMERIC_KINDS = frozenset('fiuc') # float, integer, unsigned int, complex + + +def _is_numeric(data_array): + """return True if DataArray has a numeric dtype safe for arithmetic.""" + return data_array.dtype.kind in _NUMERIC_KINDS + + +class xarrayTimeAverager(timeAverager): + ''' + class inheriting from abstract base class timeAverager + generates time-averages using xarray. + supports avg_type 'all', 'seas', and 'month'. + ''' + + def generate_timavg(self, infile = None, outfile = None): + """ + use xarray to compute time-averages. + + :param self: instance of xarrayTimeAverager + :param infile: path to input NetCDF file, default is None + :type infile: str + :param outfile: path to output file, default is None + :type outfile: str + :return: 0 on success + :rtype: int + :raises ValueError: if avg_type is not recognized + """ + + if self.avg_type not in ['all', 'seas', 'month']: + fre_logger.error('requested unknown avg_type %s.', self.avg_type) + raise ValueError(f'requested unknown avg_type {self.avg_type}') + + fre_logger.info('xarrayTimeAverager: avg_type=%s, unwgt=%s', self.avg_type, self.unwgt) + + with xr.open_dataset(infile) as ds: + if self.avg_type == 'all': + fre_logger.info('time average over all time requested.') + if self.unwgt: + ds_avg = ds.mean(dim='time', keep_attrs=True) + else: + ds_avg = _weighted_time_mean(ds) + ds_avg.to_netcdf(outfile) + fre_logger.info('done averaging over all time.') + + elif self.avg_type == 'seas': + fre_logger.info('seasonal time-averages requested.') + if self.unwgt: + ds_avg = ds.groupby('time.season').mean(dim='time', keep_attrs=True) + else: + ds_avg = _weighted_seasonal_mean(ds) + ds_avg.to_netcdf(outfile) + fre_logger.info('done averaging over seasons.') + + elif self.avg_type == 'month': + fre_logger.info('monthly time-averages requested.') + if self.unwgt: + ds_avg = ds.groupby('time.month').mean(dim='time', keep_attrs=True) + else: + ds_avg = _weighted_monthly_mean(ds) + + # write full monthly file, then split into per-month files + outfile_str = str(outfile) + ds_avg.to_netcdf(outfile_str) + fre_logger.info('done averaging over months.') + + fre_logger.info('splitting by month') + outfile_root = outfile_str.removesuffix('.nc') + for month_val in ds_avg['month'].values: + month_ds = ds_avg.sel(month=month_val) + month_file = f'{outfile_root}.{int(month_val):02d}.nc' + month_ds.to_netcdf(month_file) + fre_logger.debug('wrote month file: %s', month_file) + + fre_logger.info('done averaging') + fre_logger.info('output file created: %s', outfile) + return 0 + + +def _weighted_time_mean(ds): + """ + compute weighted time-mean using time_bnds for weights. + non-numeric variables (e.g. datetime64 metadata like average_T1/T2) + retain their first value rather than being averaged. + + :param ds: xarray Dataset with 'time_bnds' variable + :type ds: xr.Dataset + :return: time-mean Dataset + :rtype: xr.Dataset + """ + weights = _compute_time_weights(ds) + weighted_vars = {} + for var_name in ds.data_vars: + if var_name == 'time_bnds': + continue + if 'time' in ds[var_name].dims: + if _is_numeric(ds[var_name]): + weighted_vars[var_name] = ( + (ds[var_name] * weights).sum(dim='time', keep_attrs=True) + / weights.sum() + ) + else: + # non-numeric time-dependent var (e.g. decoded datetime64) + weighted_vars[var_name] = ds[var_name].isel(time=0) + else: + weighted_vars[var_name] = ds[var_name] + return xr.Dataset(weighted_vars, attrs=ds.attrs) + + +def _weighted_seasonal_mean(ds): + """ + compute weighted seasonal mean using time_bnds for weights. + non-numeric time-dependent variables are dropped from the output. + + :param ds: xarray Dataset with 'time_bnds' variable + :type ds: xr.Dataset + :return: seasonal-mean Dataset grouped by season + :rtype: xr.Dataset + """ + weights = _compute_time_weights(ds) + season = ds['time'].dt.season + weighted_vars = {} + for var_name in ds.data_vars: + if var_name == 'time_bnds': + continue + if 'time' in ds[var_name].dims: + if _is_numeric(ds[var_name]): + weighted = ds[var_name] * weights + weighted_vars[var_name] = ( + weighted.groupby(season).sum(dim='time', keep_attrs=True) + / weights.groupby(season).sum(dim='time') + ) + else: + weighted_vars[var_name] = ds[var_name] + return xr.Dataset(weighted_vars, attrs=ds.attrs) + + +def _weighted_monthly_mean(ds): + """ + compute weighted monthly mean using time_bnds for weights. + non-numeric time-dependent variables are dropped from the output. + + :param ds: xarray Dataset with 'time_bnds' variable + :type ds: xr.Dataset + :return: monthly-mean Dataset grouped by month + :rtype: xr.Dataset + """ + weights = _compute_time_weights(ds) + month = ds['time'].dt.month + weighted_vars = {} + for var_name in ds.data_vars: + if var_name == 'time_bnds': + continue + if 'time' in ds[var_name].dims: + if _is_numeric(ds[var_name]): + weighted = ds[var_name] * weights + weighted_vars[var_name] = ( + weighted.groupby(month).sum(dim='time', keep_attrs=True) + / weights.groupby(month).sum(dim='time') + ) + else: + weighted_vars[var_name] = ds[var_name] + return xr.Dataset(weighted_vars, attrs=ds.attrs) + + +def _compute_time_weights(ds): + """ + compute per-timestep weights from time_bnds as float days. + + handles the three cases xarray produces when reading time_bnds: + * timedelta64 (difference of decoded datetime64 bounds) + * cftime timedelta objects (when use_cftime=True) + * numeric float / int (bounds stored as plain numbers) + + :param ds: xarray Dataset with 'time_bnds' variable + :type ds: xr.Dataset + :return: DataArray of float weights along the time dimension + :rtype: xr.DataArray + """ + if 'time_bnds' in ds: + time_bnds = ds['time_bnds'] + raw_diff = (time_bnds[:, 1] - time_bnds[:, 0]).values # numpy array + + if raw_diff.dtype.kind == 'm': + # timedelta64 — convert to float days via seconds + float_days = raw_diff.astype('timedelta64[s]').astype('float64') / 86400.0 + elif raw_diff.dtype == object: + # cftime timedelta objects: .days + .seconds/86400 + float_days = np.array( + [td.days + td.seconds / 86400.0 for td in raw_diff], + dtype='float64' + ) + else: + # already numeric (float or int days) + float_days = raw_diff.astype('float64') + + weights = xr.DataArray(float_days, dims=['time']) + else: + fre_logger.warning('time_bnds not found, falling back to uniform weights') + weights = xr.ones_like(ds['time'], dtype='float64') + return weights diff --git a/fre/app/remap_pp_components/tests/test_remap_pp_components.py b/fre/app/remap_pp_components/tests/test_remap_pp_components.py index bd83a11ad..d673f8a98 100644 --- a/fre/app/remap_pp_components/tests/test_remap_pp_components.py +++ b/fre/app/remap_pp_components/tests/test_remap_pp_components.py @@ -4,6 +4,7 @@ import logging from pathlib import Path import pytest +import xarray as xr import fre.app.remap_pp_components.remap_pp_components as rmp # Test paths @@ -332,7 +333,7 @@ def test_remap_offline_diagnostics(monkeypatch): assert Path(f"{os.getenv('outputDir')}/atmos_scalar/{STATIC_FREQ}/{STATIC_CHUNK}/empty.nc").exists() ## COMPARE INPUT AND OUTPUT FILES ## -def test_nccmp_ncgen_remap(): +def test_compare_ncgen_remap(): """ This test compares the results of ncgen and rewrite_remap, making sure that the remapped files are identical. @@ -340,13 +341,15 @@ def test_nccmp_ncgen_remap(): comp_name = "atmos_scalar_CNAME" output_nc_file = f"{comp_name}.198001-198412.co2mass.nc" - nccmp = [ "nccmp", "-d", - Path(f"{REMAP_IN}/{NATIVE_GRID}/atmos_scalar/{FREQ}/{CHUNK}/{DATA_NC_FILES[0]}"), - Path(f"{REMAP_OUT}/{comp_name}/{PRODUCT}/monthly/5yr/{output_nc_file}") ] - sp = subprocess.run( nccmp, check = False) - assert sp.returncode == 0 + with xr.open_dataset( + Path(f"{REMAP_IN}/{NATIVE_GRID}/atmos_scalar/{FREQ}/{CHUNK}/{DATA_NC_FILES[0]}"), + decode_cf=False, decode_times=False) as ds_in, \ + xr.open_dataset( + Path(f"{REMAP_OUT}/{comp_name}/{PRODUCT}/monthly/5yr/{output_nc_file}"), + decode_cf=False, decode_times=False) as ds_out: + xr.testing.assert_equal(ds_in, ds_out) -def test_nccmp_ncgen_remap_ens_mem(): +def test_compare_ncgen_remap_ens_mem(): """ This test compares the results of ncgen and rewrite_remap, making sure that the remapped files are identical. @@ -358,14 +361,15 @@ def test_nccmp_ncgen_remap_ens_mem(): remap_ens_in = f"{TEST_OUTDIR}/ncgen-ens-output" remap_ens_out = f"{TEST_OUTDIR}/remap-ens-output" - nccmp = [ "nccmp", "-d", - Path(f"{remap_ens_in}/{NATIVE_GRID}/ens_01/atmos_scalar/{FREQ}/{CHUNK}/{DATA_NC_FILES[0]}"), - Path(f"{remap_ens_out}/{comp_name}/{PRODUCT}/ens_01/monthly/5yr/{output_nc_file}") ] + with xr.open_dataset( + Path(f"{remap_ens_in}/{NATIVE_GRID}/ens_01/atmos_scalar/{FREQ}/{CHUNK}/{DATA_NC_FILES[0]}"), + decode_cf=False, decode_times=False) as ds_in, \ + xr.open_dataset( + Path(f"{remap_ens_out}/{comp_name}/{PRODUCT}/ens_01/monthly/5yr/{output_nc_file}"), + decode_cf=False, decode_times=False) as ds_out: + xr.testing.assert_equal(ds_in, ds_out) - sp = subprocess.run( nccmp, check = False) - assert sp.returncode == 0 - -def test_nccmp_ncgen_remap_statics(): +def test_compare_ncgen_remap_statics(): """ This test compares the results of ncgen and remap, making sure that the remapped static files are identical. @@ -373,14 +377,15 @@ def test_nccmp_ncgen_remap_statics(): comp_name = "atmos_scalar_CNAME" output_nc_file = f"{comp_name}.bk.nc" - nccmp = [ "nccmp", "-d", - Path(f"{REMAP_IN}/{NATIVE_GRID}/atmos_static_scalar/" - f"{STATIC_FREQ}/{STATIC_CHUNK}/{STATIC_DATA_NC_FILES[0]}"), - Path(f"{REMAP_OUT}/static/{comp_name}/" - f"{STATIC_FREQ}/{STATIC_CHUNK}/{output_nc_file}")] - - sp = subprocess.run( nccmp, check = False) - assert sp.returncode == 0 + with xr.open_dataset( + Path(f"{REMAP_IN}/{NATIVE_GRID}/atmos_static_scalar/" + f"{STATIC_FREQ}/{STATIC_CHUNK}/{STATIC_DATA_NC_FILES[0]}"), + decode_cf=False, decode_times=False) as ds_in, \ + xr.open_dataset( + Path(f"{REMAP_OUT}/static/{comp_name}/" + f"{STATIC_FREQ}/{STATIC_CHUNK}/{output_nc_file}"), + decode_cf=False, decode_times=False) as ds_out: + xr.testing.assert_equal(ds_in, ds_out) ## VARIABLE FILTERING TESTS ## def test_remap_variable_filtering(): diff --git a/fre/cmor/README.md b/fre/cmor/README.md index ab5da72ec..7ddc256c7 100644 --- a/fre/cmor/README.md +++ b/fre/cmor/README.md @@ -53,6 +53,10 @@ repository, and the user directly. # entry-point to all subcommands fre cmor --help +# initialise CMOR resources: generate an empty experiment-config template +# and/or fetch MIP tables from trusted sources +fre cmor init --help + # main engine of the sub module, does the rewriting fre cmor run --help @@ -79,6 +83,36 @@ respective tool's `--help` output at the command line, and consult the [official fre-cli docs](https://noaa-gfdl.readthedocs.io/projects/fre-cli/en/latest/usage.html#cmorize-postprocessed-output). +### `fre cmor init` + +Initialise CMOR resources for a new experiment. This command can generate an empty experiment-configuration JSON template for +either CMIP6 or CMIP7, and optionally fetch the official MIP tables from their trusted GitHub repositories. + +Tables are fetched via `git clone --depth 1` by default; pass `--fast` to download a tarball via `curl` instead. A specific +release tag can be requested with `--tag`. + +Trusted sources: +- CMIP6: [pcmdi/cmip6-cmor-tables](https://github.com/pcmdi/cmip6-cmor-tables) +- CMIP7: [WCRP-CMIP/cmip7-cmor-tables](https://github.com/WCRP-CMIP/cmip7-cmor-tables) + + +#### Example and Description +``` +# Generate an empty CMIP6 experiment config template +fre cmor init --mip_era cmip6 --exp_config my_experiment.json + +# Generate a CMIP7 template and fetch the latest MIP tables via git +fre cmor init --mip_era cmip7 --exp_config my_cmip7_exp.json --tables_dir ./cmip7-tables + +# Fetch CMIP6 tables at a specific tag using curl (fast mode) +fre cmor init --mip_era cmip6 --tables_dir ./cmip6-tables --tag v6.2.7.18 --fast +``` + +Here, `fre cmor init` writes a JSON file containing all the fields expected by `fre cmor run` and the underlying CMOR library, +with placeholder values that the user fills in for their specific experiment. When `--tables_dir` is provided, the official MIP +tables are fetched into the specified directory. + + ### `fre cmor run` Rewrite climate model output files in a target input directory to be CMIP-compliant. This rewriting process is referred to as diff --git a/fre/cmor/__init__.py b/fre/cmor/__init__.py index a6d774df6..706495a9c 100644 --- a/fre/cmor/__init__.py +++ b/fre/cmor/__init__.py @@ -3,3 +3,4 @@ from .cmor_finder import cmor_find_subtool from .cmor_yamler import cmor_yaml_subtool from .cmor_config import cmor_config_subtool +from .cmor_init import cmor_init_subtool diff --git a/fre/cmor/cmor_constants.py b/fre/cmor/cmor_constants.py index 887eb9574..cb60a8fcd 100644 --- a/fre/cmor/cmor_constants.py +++ b/fre/cmor/cmor_constants.py @@ -89,6 +89,21 @@ ] +# --------------------------------------------------------------------------- +# CF calendar aliases (used by cmor_mixer / cmor_helpers for calendar checks) +# --------------------------------------------------------------------------- +# Maps CF calendar *alias* names to their canonical equivalents. +# Only entries where key != value are needed: calendars_are_equivalent falls +# back to the bare name when a key is absent (via dict.get(key, key)). +# Reference: CF Conventions §4.4.1 – The Calendar Attribute +# https://cfconventions.org/cf-conventions/cf-conventions.html#calendar +CF_CALENDAR_ALIASES: dict = { + '365_day': 'noleap', # 365-day no-leap alias + '366_day': 'all_leap', # 366-day all-leap alias + 'gregorian': 'standard', # mixed Gregorian/Julian alias +} + + # --------------------------------------------------------------------------- # Output / display flags # --------------------------------------------------------------------------- diff --git a/fre/cmor/cmor_helpers.py b/fre/cmor/cmor_helpers.py index a7e6c9033..f8e6edbab 100644 --- a/fre/cmor/cmor_helpers.py +++ b/fre/cmor/cmor_helpers.py @@ -29,6 +29,7 @@ - ``get_json_file_data(json_file_path)`` - ``update_grid_and_label(json_file_path, new_grid_label, new_grid, new_nom_res, output_file_path)`` - ``update_calendar_type(json_file_path, new_calendar_type, output_file_path)`` +- ``calendars_are_equivalent(cal1, cal2)`` - ``check_path_existence(some_path)`` - ``iso_to_bronx_chunk(cmor_chunk_in)`` - ``conv_mip_to_bronx_freq(cmor_table_freq)`` @@ -54,7 +55,7 @@ from netCDF4 import Dataset, Variable from .cmor_constants import ( ARCHIVE_GOLD_DATA_DIR, CMIP7_GOLD_OCEAN_FILE_STUB, CMIP6_GOLD_OCEAN_FILE_STUB, - INPUT_TO_MIP_VERT_DIM ) + INPUT_TO_MIP_VERT_DIM, CF_CALENDAR_ALIASES ) fre_logger = logging.getLogger(__name__) @@ -575,6 +576,27 @@ def check_path_existence(some_path: str): if not Path(some_path).exists(): raise FileNotFoundError(f'does not exist: {some_path}') + +def calendars_are_equivalent(cal1: str, cal2: str) -> bool: + """ + Return True if two CF calendar names refer to the same calendar. + + The CF Conventions define several calendar aliases (e.g. ``noleap`` and + ``365_day`` are the same calendar; ``standard`` and ``gregorian`` are the + same calendar). This function resolves both names to a canonical form via + :data:`.cmor_constants.CF_CALENDAR_ALIASES` before comparing, so that + alias pairs compare as equal. + + :param cal1: First CF calendar name (case-insensitive). + :type cal1: str + :param cal2: Second CF calendar name (case-insensitive). + :type cal2: str + :return: ``True`` if both names refer to the same calendar, ``False`` otherwise. + :rtype: bool + """ + return CF_CALENDAR_ALIASES.get(cal1.lower(), cal1.lower()) == CF_CALENDAR_ALIASES.get(cal2.lower(), cal2.lower()) + + def iso_to_bronx_chunk(cmor_chunk_in: str) -> str: """ Convert an ISO8601 duration string (e.g., 'P5Y') to FRE-bronx-style chunk string (e.g., '5yr'). diff --git a/fre/cmor/cmor_init.py b/fre/cmor/cmor_init.py new file mode 100644 index 000000000..f784df4d0 --- /dev/null +++ b/fre/cmor/cmor_init.py @@ -0,0 +1,320 @@ +""" +CMOR Init Subtool +================= + +This module powers the ``fre cmor init`` command, providing two key capabilities: + +1. **Experiment config generation** – writes an empty (template) JSON experiment + configuration file for either CMIP6 or CMIP7. The user fills in the + placeholder values before running ``fre cmor run`` or ``fre cmor yaml``. + +2. **MIP table retrieval** – fetches the official MIP tables from trusted + GitHub repositories. By default tables are fetched via ``git clone`` + (shallow, depth 1); with ``--fast`` they are fetched as a tarball via + ``curl`` and extracted in-place. + +Trusted sources +--------------- +- CMIP6: https://github.com/pcmdi/cmip6-cmor-tables +- CMIP7: https://github.com/WCRP-CMIP/cmip7-cmor-tables + +Functions +--------- +- ``cmor_init_subtool(...)`` +""" + +import json +import logging +import subprocess +import tarfile +import tempfile +from pathlib import Path + +fre_logger = logging.getLogger(__name__) + + +# --------------------------------------------------------------------------- +# Trusted sources for MIP tables +# --------------------------------------------------------------------------- +MIP_TABLE_REPOS = { + 'cmip6': 'https://github.com/pcmdi/cmip6-cmor-tables', + 'cmip7': 'https://github.com/WCRP-CMIP/cmip7-cmor-tables', +} + + +# --------------------------------------------------------------------------- +# Empty / template experiment configuration dictionaries +# --------------------------------------------------------------------------- + +def _cmip6_exp_config_template(): + """Return an ordered dict-like structure for an empty CMIP6 experiment config.""" + return { + "#note": " **** CMIP6 experiment configuration template – fill in values below ****", + "source_type": "", + "experiment_id": "", + "activity_id": "", + "sub_experiment_id": "none", + "realization_index": "1", + "initialization_index": "1", + "physics_index": "1", + "forcing_index": "1", + "run_variant": "", + "parent_experiment_id": "no parent", + "parent_activity_id": "no parent", + "parent_source_id": "no parent", + "parent_variant_label": "no parent", + "parent_time_units": "no parent", + "branch_method": "no parent", + "branch_time_in_child": 0.0, + "branch_time_in_parent": 0.0, + "institution_id": "", + "source_id": "", + "calendar": "", + "grid": "", + "grid_label": "", + "nominal_resolution": "", + "license": "", + "outpath": "", + "contact": "", + "history": "", + "comment": "", + "references": "", + "sub_experiment": "none", + "institution": "", + "source": "", + "_controlled_vocabulary_file": "CMIP6_CV.json", + "_AXIS_ENTRY_FILE": "CMIP6_coordinate.json", + "_FORMULA_VAR_FILE": "CMIP6_formula_terms.json", + "_cmip6_option": "CMIP6", + "mip_era": "CMIP6", + "parent_mip_era": "no parent", + "tracking_prefix": "hdl:21.14100", + "_history_template": ( + "%s ;rewrote data to be consistent with " + " for variable found in table ." + ), + "output_path_template": ( + "" + "<_member_id>" + ), + "output_file_template": ( + "
<_member_id>" + ), + } + + +def _cmip7_exp_config_template(): + """Return an ordered dict-like structure for an empty CMIP7 experiment config.""" + return { + "#note": " **** CMIP7 experiment configuration template – fill in values below ****", + "contact": "MIP participant mipmember@foobar.c.om", + "comment": "additional important information not fitting into other fields can be placed here", + "license": "", + "references": "", + "drs_specs": "MIP-DRS7", + "archive_id": "WCRP", + "license_id": "CC-BY-4-0", + "tracking_prefix": "hdl:21.14107", + "_cmip7_option": 1, + "mip_era": "CMIP7", + "parent_mip_era": "CMIP7", + "activity_id": "CMIP", + "parent_activity_id": "CMIP", + "institution": "", + "institution_id": "", + "source": "", + "source_id": "", + "source_type": "", + "experiment_id": "", + "sub_experiment": "none", + "sub_experiment_id": "none", + "parent_source_id": "", + "parent_experiment_id": "", + "realization_index": "r1", + "initialization_index": "i1", + "physics_index": "p1", + "forcing_index": "f1", + "run_variant": "", + "parent_variant_label": "", + "parent_time_units": "", + "branch_method": "no parent", + "branch_time_in_child": 0.0, + "branch_time_in_parent": 0.0, + "calendar": "", + "grid": "PLACEHOLD", + "grid_label": "g99", + "frequency": "", + "region": "", + "nominal_resolution": "", + "history": "", + "_history_template": ( + "%s ;rewrote data to be consistent with " + " for variable found in table ." + ), + "outpath": ".", + "output_path_template": ( + "" + "" + ), + "output_file_template": ( + "" + "" + ), + "_controlled_vocabulary_file": "../tables-cvs/cmor-cvs.json", + "_AXIS_ENTRY_FILE": "CMIP7_coordinate.json", + "_FORMULA_VAR_FILE": "CMIP7_formula_terms.json", + } + + +# --------------------------------------------------------------------------- +# Table-fetching helpers +# --------------------------------------------------------------------------- + +def _fetch_tables_git(repo_url, tables_dir, tag=None): + """ + Clone MIP tables via ``git clone --depth 1``. + + Parameters + ---------- + repo_url : str + HTTPS URL of the MIP table repository. + tables_dir : str + Local directory to clone into. + tag : str or None + Optional git tag / branch to check out. + """ + cmd = ['git', 'clone', '--depth', '1'] + if tag: + cmd += ['--branch', tag] + cmd += [repo_url, tables_dir] + + fre_logger.info('fetching MIP tables via git: %s', ' '.join(cmd)) + subprocess.run(cmd, check=True) + fre_logger.info('MIP tables cloned to %s', tables_dir) + + +def _fetch_tables_curl(repo_url, tables_dir, tag=None): + """ + Fetch MIP tables as a tarball via ``curl`` and extract. + + Parameters + ---------- + repo_url : str + HTTPS URL of the MIP table repository. + tables_dir : str + Local directory to extract into. + tag : str or None + Optional git tag / branch. Defaults to ``main`` if *None*. + """ + ref = tag if tag else 'main' + if tag: + tarball_url = f'{repo_url}/archive/refs/tags/{ref}.tar.gz' + else: + tarball_url = f'{repo_url}/archive/refs/heads/{ref}.tar.gz' + + tables_path = Path(tables_dir) + tables_path.mkdir(parents=True, exist_ok=True) + + with tempfile.NamedTemporaryFile(suffix='.tar.gz', delete=False) as tmp: + tmp_path = tmp.name + + try: + curl_cmd = ['curl', '-L', '-o', tmp_path, tarball_url] + fre_logger.info('fetching MIP tables via curl: %s', ' '.join(curl_cmd)) + subprocess.run(curl_cmd, check=True) + + fre_logger.info('extracting tarball to %s', tables_dir) + with tarfile.open(tmp_path, 'r:gz') as tar: + tar.extractall(path=tables_dir) + finally: + Path(tmp_path).unlink(missing_ok=True) + + fre_logger.info('MIP tables extracted to %s', tables_dir) + + +# --------------------------------------------------------------------------- +# Main subtool entry-point +# --------------------------------------------------------------------------- + +def cmor_init_subtool( + mip_era, + exp_config=None, + tables_dir=None, + tag=None, + fast=False +): + """ + Initialise CMOR resources for the user. + + Depending on the arguments supplied this function will: + + * Write an empty experiment-configuration JSON file for the requested MIP + era (``cmip6`` or ``cmip7``) when *exp_config* is given (or when neither + *exp_config* nor *tables_dir* is provided — in which case a default + filename is used). + * Clone / download the official MIP tables into *tables_dir* when that + argument is provided. + + Parameters + ---------- + mip_era : str + ``'cmip6'`` or ``'cmip7'``. + exp_config : str or None + Output path for the template experiment-config JSON file. + When *None* and *tables_dir* is also *None*, a default path + ``CMOR__template.json`` in the current directory is used. + tables_dir : str or None + Directory into which MIP tables will be fetched. + tag : str or None + Optional git tag / release for the MIP tables repository. + fast : bool + When *True*, use ``curl`` to download a tarball instead of ``git clone``. + + Returns + ------- + dict + A dictionary with keys ``'exp_config'`` (path written or *None*) + and ``'tables_dir'`` (path written or *None*). + """ + mip_era_lower = mip_era.lower() + if mip_era_lower not in ('cmip6', 'cmip7'): + raise ValueError(f"mip_era must be 'cmip6' or 'cmip7', got '{mip_era}'") + + result = {'exp_config': None, 'tables_dir': None} + + # -- experiment config -- + # Write config when explicitly requested OR when tables_dir is not given + # (i.e. the user invoked `fre cmor init` without --tables_dir). + if exp_config is not None or tables_dir is None: + if exp_config is None: + exp_config = f'CMOR_{mip_era_lower}_template.json' + + template_func = { + 'cmip6': _cmip6_exp_config_template, + 'cmip7': _cmip7_exp_config_template, + }[mip_era_lower] + + config_data = template_func() + + out_path = Path(exp_config) + out_path.parent.mkdir(parents=True, exist_ok=True) + with open(out_path, 'w', encoding='utf-8') as fh: + json.dump(config_data, fh, indent=4) + fh.write('\n') + + fre_logger.info('wrote %s experiment config template to %s', + mip_era_lower.upper(), out_path) + click_echo = f'Wrote {mip_era_lower.upper()} experiment config template to {out_path}' + print(click_echo) + result['exp_config'] = str(out_path) + + # -- MIP tables -- + if tables_dir is not None: + repo_url = MIP_TABLE_REPOS[mip_era_lower] + if fast: + _fetch_tables_curl(repo_url, tables_dir, tag=tag) + else: + _fetch_tables_git(repo_url, tables_dir, tag=tag) + result['tables_dir'] = tables_dir + + return result diff --git a/fre/cmor/cmor_mixer.py b/fre/cmor/cmor_mixer.py index 43aa55d66..87e51f1fc 100755 --- a/fre/cmor/cmor_mixer.py +++ b/fre/cmor/cmor_mixer.py @@ -45,7 +45,8 @@ from .cmor_helpers import ( print_data_minmax, from_dis_gimme_dis, find_statics_file, create_lev_bnds, get_iso_datetime_ranges, check_dataset_for_ocean_grid, get_vertical_dimension, create_tmp_dir, get_json_file_data, update_grid_and_label, #update_outpath, - update_calendar_type, find_gold_ocean_statics_file, filter_brands ) + update_calendar_type, find_gold_ocean_statics_file, filter_brands, + calendars_are_equivalent ) from .cmor_constants import ( ACCEPTED_VERT_DIMS, NON_HYBRID_SIGMA_COORDS, ALT_HYBRID_SIGMA_COORDS, DEPTH_COORDS, CMOR_NC_FILE_ACTION, CMOR_VERBOSITY, CMOR_EXIT_CTL, CMOR_MK_SUBDIRS, CMOR_LOG ) @@ -206,7 +207,7 @@ def rewrite_netcdf_file_var( mip_var_cfgs: dict = None, else: with open(json_exp_config, "r", encoding="utf-8") as file: exp_cfg_calendar = json.load(file)['calendar'] - if exp_cfg_calendar != time_coords_calendar: + if not calendars_are_equivalent(time_coords_calendar, exp_cfg_calendar): raise ValueError(f"data calendar type {time_coords_calendar} " f"does not match input config calendar type: {exp_cfg_calendar}") @@ -1004,6 +1005,31 @@ def cmor_run_subtool(indir: str = None, mip_fullvar_list = mip_var_cfgs["variable_entry"].keys() fre_logger.debug('the following variables were read from the table: %s', mip_fullvar_list) + # CHECK for mip_era mismatch between exp config and the MIP table format. + # CMIP7 tables use branded variable names where the brand suffix (after the first '_') contains + # hyphens, e.g. 'sos_tavg-u-hxy-sea'. CMIP6 tables use plain variable names, e.g. 'sos'. + # Detecting a mismatch early provides a clear error instead of a confusing 'empty variable list' failure. + _table_vars_are_branded = ( + bool(mip_fullvar_list) and + all('_' in v and '-' in v.split('_', 1)[1] for v in mip_fullvar_list) + ) + if exp_cfg_mip_era == 'CMIP6' and _table_vars_are_branded: + raise ValueError( + 'mip_era in experiment config is "CMIP6", but the MIP table appears to be a CMIP7 table. ' + 'CMIP7 tables have branded variable names (e.g., "sos_tavg-u-hxy-sea"), while CMIP6 tables ' + 'use plain variable names (e.g., "sos"). ' + f'Please set mip_era to "CMIP7" in your experiment config ({json_exp_config}), ' + f'or use a CMIP6 table instead of: {json_table_config}.' + ) + if exp_cfg_mip_era == 'CMIP7' and not _table_vars_are_branded: + raise ValueError( + 'mip_era in experiment config is "CMIP7", but the MIP table appears to be a CMIP6 table. ' + 'CMIP6 tables use plain variable names (e.g., "sos"), while CMIP7 tables have branded ' + 'variable names (e.g., "sos_tavg-u-hxy-sea"). ' + f'Please set mip_era to "CMIP6" in your experiment config ({json_exp_config}), ' + f'or use a CMIP7 table instead of: {json_table_config}.' + ) + # make the TABLE's variable list, and brand list (if CMIP7) mip_var_list, mip_var_brand_list = None, None if exp_cfg_mip_era == 'CMIP7': diff --git a/fre/cmor/frecmor.py b/fre/cmor/frecmor.py index e7d774d78..d3396ee48 100755 --- a/fre/cmor/frecmor.py +++ b/fre/cmor/frecmor.py @@ -6,6 +6,7 @@ from . import cmor_run_subtool from . import cmor_yaml_subtool from . import cmor_config_subtool +from . import cmor_init_subtool from .cmor_finder import make_simple_varlist OPT_VAR_NAME_HELP="optional, specify a variable name to specifically process only filenames " + \ @@ -237,3 +238,32 @@ def config(pp_dir, mip_tables_dir, mip_era, exp_config, output_yaml, overwrite=overwrite, calendar_type=calendar ) + + +@cmor_cli.command() +@click.option("-m", "--mip_era", type=click.Choice(['cmip6', 'cmip7'], case_sensitive=False), + required=True, + help="MIP era for the template: 'cmip6' or 'cmip7'.") +@click.option("-e", "--exp_config", type=str, default=None, + help="Output path for the template experiment-config JSON file. " + "When omitted and --tables_dir is also omitted, a default " + "filename is used.") +@click.option("-t", "--tables_dir", type=str, default=None, + help="Directory into which MIP tables will be fetched from " + "trusted sources. Omit to skip table retrieval.") +@click.option("--tag", type=str, default=None, + help="Specific git tag or release for the MIP tables repository.") +@click.option("--fast", is_flag=True, default=False, + help="Use curl to download a tarball instead of git clone.") +def init(mip_era, exp_config, tables_dir, tag, fast): + """ + Initialise CMOR resources: write an empty experiment-config JSON template + and/or fetch MIP tables from trusted sources. + """ + cmor_init_subtool( + mip_era=mip_era, + exp_config=exp_config, + tables_dir=tables_dir, + tag=tag, + fast=fast + ) diff --git a/fre/cmor/tests/test_cmor_helpers.py b/fre/cmor/tests/test_cmor_helpers.py index 8ca3ab060..f9bfa32c7 100644 --- a/fre/cmor/tests/test_cmor_helpers.py +++ b/fre/cmor/tests/test_cmor_helpers.py @@ -13,7 +13,7 @@ create_lev_bnds, get_iso_datetime_ranges, iso_to_bronx_chunk, create_tmp_dir, get_json_file_data, update_grid_and_label, get_bronx_freq_from_mip_table, #update_outpath, - filter_brands ) + filter_brands, calendars_are_equivalent ) def test_iso_to_bronx_chunk(): ''' tests value error raising by iso_to_bronx_chunk ''' @@ -436,3 +436,45 @@ def test_filter_brands_multiple_remain(): has_time_bnds=True, input_vert_dim=0, ) + + +# ---- calendars_are_equivalent tests ---- + +def test_calendars_are_equivalent_noleap_and_365_day(): + ''' noleap and 365_day are CF aliases for the same calendar ''' + assert calendars_are_equivalent('noleap', '365_day') + + +def test_calendars_are_equivalent_365_day_and_noleap(): + ''' 365_day and noleap are CF aliases for the same calendar (reversed order) ''' + assert calendars_are_equivalent('365_day', 'noleap') + + +def test_calendars_are_equivalent_all_leap_and_366_day(): + ''' all_leap and 366_day are CF aliases for the same calendar ''' + assert calendars_are_equivalent('all_leap', '366_day') + + +def test_calendars_are_equivalent_standard_and_gregorian(): + ''' standard and gregorian are CF aliases for the same calendar ''' + assert calendars_are_equivalent('standard', 'gregorian') + + +def test_calendars_are_equivalent_same_name(): + ''' identical calendar names should be equivalent ''' + assert calendars_are_equivalent('360_day', '360_day') + assert calendars_are_equivalent('julian', 'julian') + assert calendars_are_equivalent('proleptic_gregorian', 'proleptic_gregorian') + + +def test_calendars_are_equivalent_case_insensitive(): + ''' comparison is case-insensitive ''' + assert calendars_are_equivalent('NoLeap', '365_DAY') + assert calendars_are_equivalent('STANDARD', 'gregorian') + + +def test_calendars_are_equivalent_different_calendars(): + ''' distinct calendars should NOT be equivalent ''' + assert not calendars_are_equivalent('noleap', '360_day') + assert not calendars_are_equivalent('gregorian', '360_day') + assert not calendars_are_equivalent('julian', 'noleap') diff --git a/fre/cmor/tests/test_cmor_run_subtool.py b/fre/cmor/tests/test_cmor_run_subtool.py index 135c645aa..71aea0da7 100644 --- a/fre/cmor/tests/test_cmor_run_subtool.py +++ b/fre/cmor/tests/test_cmor_run_subtool.py @@ -9,6 +9,8 @@ import subprocess import shutil +import netCDF4 +import numpy as np import pytest from fre.cmor import cmor_run_subtool @@ -57,6 +59,57 @@ def test_setup_cmor_cmip_table_repo(): FULL_OUTPUTFILE = \ f"{FULL_OUTPUTDIR}/sos_Omon_PCMDI-test-1-0_piControl-withism_r3i1p1f1_{GRID_LABEL}_{DATETIMES_INPUTFILE}.nc" +# CMIP6-required global attributes that must be present in CMOR output +CMIP6_REQUIRED_GLOBAL_ATTRS = [ + 'variable_id', 'mip_era', 'table_id', + 'experiment_id', 'institution_id', 'source_id' +] + + +def _assert_data_matches(ds_in, ds_out): + ''' + helper: assert that science variable data, coordinate data, and shapes + are preserved between input and CMOR output datasets. + ''' + # the science variable data must be preserved exactly + assert np.array_equal(ds_in.variables['sos'][:], ds_out.variables['sos'][:]), \ + "sos data values differ between input and CMOR output" + + # coordinate data must be preserved + assert np.allclose(ds_in.variables['lat'][:], ds_out.variables['lat'][:]), \ + "latitude data differs between input and CMOR output" + assert np.allclose(ds_in.variables['lon'][:], ds_out.variables['lon'][:]), \ + "longitude data differs between input and CMOR output" + assert np.allclose(ds_in.variables['time'][:], ds_out.variables['time'][:]), \ + "time data differs between input and CMOR output" + + # variable shapes must be preserved + assert ds_in.variables['sos'][:].shape == ds_out.variables['sos'][:].shape, \ + "sos data shape differs between input and CMOR output" + + +def _assert_metadata_matches(ds_in, ds_out): + ''' + helper: assert that CMIP6-required global attributes are present and that + key variable-level metadata is preserved between input and CMOR output datasets. + ''' + # CMOR output must contain CMIP6-required global attributes + for required_attr in CMIP6_REQUIRED_GLOBAL_ATTRS: + assert required_attr in ds_out.ncattrs(), \ + f"CMOR output missing required global attribute '{required_attr}'" + + # science variable standard_name and long_name must be preserved + assert ds_in.variables['sos'].standard_name == ds_out.variables['sos'].standard_name, \ + "sos standard_name differs between input and CMOR output" + assert ds_in.variables['sos'].long_name == ds_out.variables['sos'].long_name, \ + "sos long_name differs between input and CMOR output" + + # _FillValue and missing_value must be preserved + assert ds_in.variables['sos']._FillValue == ds_out.variables['sos']._FillValue, \ + "sos _FillValue differs between input and CMOR output" + assert ds_in.variables['sos'].missing_value == ds_out.variables['sos'].missing_value, \ + "sos missing_value differs between input and CMOR output" + def test_setup_fre_cmor_run_subtool(capfd): ''' @@ -124,24 +177,13 @@ def test_fre_cmor_run_subtool_case1_output_compare_data(capfd): print(f'FULL_OUTPUTFILE={FULL_OUTPUTFILE}') print(f'FULL_INPUTFILE={FULL_INPUTFILE}') - nccmp_cmd= [ "nccmp", "-f", "-d", - f"{FULL_INPUTFILE}", - f"{FULL_OUTPUTFILE}" ] - print(f"via subprocess, running {' '.join(nccmp_cmd)}") - result = subprocess.run( ' '.join(nccmp_cmd), - shell=True, - check=False, - capture_output=True - ) + with netCDF4.Dataset(FULL_INPUTFILE) as ds_in, \ + netCDF4.Dataset(FULL_OUTPUTFILE) as ds_out: + # file formats should differ: CMOR converts input to NETCDF4_CLASSIC + assert ds_in.file_format != ds_out.file_format, \ + f"expected file formats to differ, got input={ds_in.file_format}, output={ds_out.file_format}" - # err_list has length two if end in newline - err_list = result.stderr.decode().split('\n') - expected_err = \ - "DIFFER : FILE FORMATS : NC_FORMAT_NETCDF4 <> NC_FORMAT_NETCDF4_CLASSIC" - assert all( [result.returncode == 1, - len(err_list)==2, - '' in err_list, - expected_err in err_list ] ) + _assert_data_matches(ds_in, ds_out) _out, _err = capfd.readouterr() def test_fre_cmor_run_subtool_case1_output_compare_metadata(capfd): @@ -149,16 +191,13 @@ def test_fre_cmor_run_subtool_case1_output_compare_metadata(capfd): print(f'FULL_OUTPUTFILE={FULL_OUTPUTFILE}') print(f'FULL_INPUTFILE={FULL_INPUTFILE}') - nccmp_cmd= [ "nccmp", "-f", "-m", "-g", - f"{FULL_INPUTFILE}", - f"{FULL_OUTPUTFILE}" ] - print(f"via subprocess, running {' '.join(nccmp_cmd)}") - result = subprocess.run( ' '.join(nccmp_cmd), - shell=True, - check=False - ) + with netCDF4.Dataset(FULL_INPUTFILE) as ds_in, \ + netCDF4.Dataset(FULL_OUTPUTFILE) as ds_out: + # CMOR processing should add/change global attributes + assert set(ds_in.ncattrs()) != set(ds_out.ncattrs()), \ + "expected global attributes to differ between input and CMOR output" - assert result.returncode == 1 + _assert_metadata_matches(ds_in, ds_out) _out, _err = capfd.readouterr() @@ -253,22 +292,13 @@ def test_fre_cmor_run_subtool_case2_output_compare_data(capfd): print(f'FULL_OUTPUTFILE={FULL_OUTPUTFILE}') print(f'FULL_INPUTFILE_DIFF={FULL_INPUTFILE_DIFF}') - nccmp_cmd= [ "nccmp", "-f", "-d", - f"{FULL_INPUTFILE_DIFF}", - f"{FULL_OUTPUTFILE}" ] - print(f"via subprocess, running {' '.join(nccmp_cmd)}") - result = subprocess.run( ' '.join(nccmp_cmd), - shell=True, - check=False, - capture_output=True - ) - - err_list = result.stderr.decode().split('\n')#length two if end in newline - expected_err="DIFFER : FILE FORMATS : NC_FORMAT_NETCDF4 <> NC_FORMAT_NETCDF4_CLASSIC" - assert all( [result.returncode == 1, - len(err_list)==2, - '' in err_list, - expected_err in err_list ] ) + with netCDF4.Dataset(FULL_INPUTFILE_DIFF) as ds_in, \ + netCDF4.Dataset(FULL_OUTPUTFILE) as ds_out: + # file formats should differ: CMOR converts input to NETCDF4_CLASSIC + assert ds_in.file_format != ds_out.file_format, \ + f"expected file formats to differ, got input={ds_in.file_format}, output={ds_out.file_format}" + + _assert_data_matches(ds_in, ds_out) _out, _err = capfd.readouterr() def test_fre_cmor_run_subtool_case2_output_compare_metadata(capfd): @@ -276,16 +306,13 @@ def test_fre_cmor_run_subtool_case2_output_compare_metadata(capfd): print(f'FULL_OUTPUTFILE={FULL_OUTPUTFILE}') print(f'FULL_INPUTFILE_DIFF={FULL_INPUTFILE_DIFF}') - nccmp_cmd= [ "nccmp", "-f", "-m", "-g", - f"{FULL_INPUTFILE_DIFF}", - f"{FULL_OUTPUTFILE}" ] - print(f"via subprocess, running {' '.join(nccmp_cmd)}") - result = subprocess.run( ' '.join(nccmp_cmd), - shell=True, - check=False - ) + with netCDF4.Dataset(FULL_INPUTFILE_DIFF) as ds_in, \ + netCDF4.Dataset(FULL_OUTPUTFILE) as ds_out: + # CMOR processing should add/change global attributes + assert set(ds_in.ncattrs()) != set(ds_out.ncattrs()), \ + "expected global attributes to differ between input and CMOR output" - assert result.returncode == 1 + _assert_metadata_matches(ds_in, ds_out) _out, _err = capfd.readouterr() def test_git_cleanup(): @@ -404,3 +431,60 @@ def test_fre_cmor_run_subtool_unsupported_mip_era(tmp_path): json_exp_config = str(bad_exp), outdir = OUTDIR, ) + + +def test_fre_cmor_run_subtool_cmip6_config_cmip7_table_mismatch(tmp_path): + ''' + Test that a ValueError with an informative message is raised when exp config says CMIP6 + but the table is a CMIP7-style table (all variable names have brand suffixes, e.g. + "sos_tavg-u-hxy-sea"). + ''' + # create a minimal CMIP7-style table JSON (variables have underscore brand suffixes) + fake_cmip7_table = tmp_path / 'CMIP7_fake.json' + fake_cmip7_table.write_text(json.dumps({ + "variable_entry": { + "sos_tavg-u-hxy-sea": {"dimensions": "longitude latitude time"}, + "tas_tavg-u-hxy-u": {"dimensions": "longitude latitude time"}, + } + })) + + # exp config says CMIP6 + with pytest.raises(ValueError, match='mip_era in experiment config is "CMIP6"'): + cmor_run_subtool( + indir = INDIR, + json_var_list = VARLIST, + json_table_config = str(fake_cmip7_table), + json_exp_config = EXP_CONFIG, + outdir = OUTDIR, + ) + + +def test_fre_cmor_run_subtool_cmip7_config_cmip6_table_mismatch(tmp_path): + ''' + Test that a ValueError with an informative message is raised when exp config says CMIP7 + but the table is a CMIP6-style table (variable names do not have brand suffixes, e.g. "sos"). + ''' + # create a minimal CMIP6-style table JSON (plain variable names, no underscore brands) + fake_cmip6_table = tmp_path / 'CMIP6_fake.json' + fake_cmip6_table.write_text(json.dumps({ + "variable_entry": { + "sos": {"dimensions": "longitude latitude time"}, + "tas": {"dimensions": "longitude latitude time"}, + } + })) + + # exp config says CMIP7 + cmip7_exp = tmp_path / 'cmip7_exp.json' + with open(EXP_CONFIG, 'r', encoding='utf-8') as f: + exp_data = json.load(f) + exp_data['mip_era'] = 'CMIP7' + cmip7_exp.write_text(json.dumps(exp_data)) + + with pytest.raises(ValueError, match='mip_era in experiment config is "CMIP7"'): + cmor_run_subtool( + indir = INDIR, + json_var_list = VARLIST, + json_table_config = str(fake_cmip6_table), + json_exp_config = str(cmip7_exp), + outdir = OUTDIR, + ) diff --git a/fre/cmor/tests/test_cmor_yamler.py b/fre/cmor/tests/test_cmor_yamler.py deleted file mode 100644 index 370a5dc0f..000000000 --- a/fre/cmor/tests/test_cmor_yamler.py +++ /dev/null @@ -1,10 +0,0 @@ -import pytest -from fre.cmor.cmor_yamler import cmor_yaml_subtool - -def test_modelyaml_dne_raise_filenotfound(): - with pytest.raises(FileNotFoundError): - cmor_yaml_subtool( yamlfile = "MODEL YAML DOESN'T EXIST", - exp_name = "FOO", - platform = "BAR", - target = "BAZ", - dry_run_mode = True ) diff --git a/fre/cmor/tests/test_cmor_yamler_subtool.py b/fre/cmor/tests/test_cmor_yamler_subtool.py index a9f9e971c..bad62d5ce 100644 --- a/fre/cmor/tests/test_cmor_yamler_subtool.py +++ b/fre/cmor/tests/test_cmor_yamler_subtool.py @@ -633,3 +633,14 @@ def test_dry_run_prints_python_call(mock_consolidate, tmp_path): output_nc = list(Path(str(outdir)).rglob('*.nc')) assert len(output_nc) == 0 + +def test_modelyaml_dne_raise_filenotfound(): + ''' + tests that a FileNotFoundError is raised when the yamlfile input does not exist + ''' + with pytest.raises(FileNotFoundError): + cmor_yaml_subtool( yamlfile = "MODEL YAML DOESN'T EXIST", + exp_name = "FOO", + platform = "BAR", + target = "BAZ", + dry_run_mode = True ) diff --git a/fre/pp/tests/test_split_netcdf.py b/fre/pp/tests/test_split_netcdf.py index 6272757c7..99ebd5336 100644 --- a/fre/pp/tests/test_split_netcdf.py +++ b/fre/pp/tests/test_split_netcdf.py @@ -10,7 +10,9 @@ from pathlib import Path import click +import numpy as np import pytest +import xarray as xr from click.testing import CliRunner from fre import fre @@ -162,14 +164,25 @@ def test_split_file_data(workdir,newdir, origdir): print(f"orig count: {orig_count} new count: {new_count}") all_files_equal=True for sf in split_files: - nccmp_cmd = [ 'nccmp', '-d', '--force', - osp.join(origdir, sf), osp.join(newdir, sf) ] - sp = subprocess.run( nccmp_cmd) - if sp.returncode != 0: - all_files_equal=False - print(" ".join(nccmp_cmd)) - print("comparison of " + nccmp_cmd[-1] + " and " + nccmp_cmd[-2] + " did not match") - print(sp.stdout, sp.stderr) + these_files_equal=False + orig_file = osp.join(origdir, sf) + new_file = osp.join(newdir, sf) + ds_orig = None + ds_new = None + try: + ds_orig = xr.open_dataset(orig_file) + ds_new = xr.open_dataset(new_file) + xr.testing.assert_equal(ds_orig, ds_new) + these_files_equal=True + except AssertionError: + these_files_equal=False + print(f"data comparison of {new_file} and {orig_file} did not match") + finally: + all_files_equal = all_files_equal and these_files_equal + if ds_orig is not None: + ds_orig.close() + if ds_new is not None: + ds_new.close() assert all_files_equal and same_count_files #test_split_file_metadata is currently commented out because the set of commands: @@ -209,14 +222,52 @@ def test_split_file_metadata(workdir,newdir, origdir): same_count_files = new_count == orig_count all_files_equal=True for sf in split_files: - nccmp_cmd = [ 'nccmp', '-mg', '--force', - osp.join(origdir, sf), osp.join(newdir, sf) ] - sp = subprocess.run( nccmp_cmd) - if sp.returncode != 0: - print(" ".join(nccmp_cmd)) + orig_file = osp.join(origdir, sf) + new_file = osp.join(newdir, sf) + ds_orig = None + ds_new = None + try: + ds_orig = xr.open_dataset(orig_file) + ds_new = xr.open_dataset(new_file) + # Compare global attributes (-g flag equivalent) + assert set(ds_orig.attrs.keys()) == set(ds_new.attrs.keys()), \ + f"global attribute keys differ for {sf}" + for attr_key in ds_orig.attrs: + orig_val = ds_orig.attrs[attr_key] + new_val = ds_new.attrs[attr_key] + if isinstance(orig_val, np.ndarray): + assert np.array_equal(orig_val, new_val), \ + f"global attribute {attr_key} differs for {sf}" + else: + assert orig_val == new_val, \ + f"global attribute {attr_key} differs for {sf}" + # Compare variable metadata/attributes (-m flag equivalent) + assert set(ds_orig.variables) == set(ds_new.variables), \ + f"variable sets differ for {sf}" + for var in ds_orig.variables: + assert ds_orig[var].dtype == ds_new[var].dtype, \ + f"dtype for variable {var} differs in {sf}: " \ + f"{ds_orig[var].dtype} vs {ds_new[var].dtype}" + assert set(ds_orig[var].attrs.keys()) == set(ds_new[var].attrs.keys()), \ + f"attribute keys for variable {var} differ in {sf}" + for attr_key in ds_orig[var].attrs: + orig_val = ds_orig[var].attrs[attr_key] + new_val = ds_new[var].attrs[attr_key] + if isinstance(orig_val, np.ndarray): + assert np.array_equal(orig_val, new_val), \ + f"attribute {attr_key} for variable {var} differs in {sf}" + else: + assert orig_val == new_val, \ + f"attribute {attr_key} for variable {var} differs in {sf}" + except AssertionError as exc: all_files_equal=False - print("comparison of " + nccmp_cmd[-1] + " and " + nccmp_cmd[-2] + " did not match") - print(sp.stdout, sp.stderr) + print(f"metadata comparison of {new_file} and {orig_file} did not match") + print(exc) + finally: + if ds_orig is not None: + ds_orig.close() + if ds_new is not None: + ds_new.close() assert all_files_equal and same_count_files #clean up splitting files diff --git a/fre/tests/test_files/CMOR_CMIP7_input_example.json b/fre/tests/test_files/CMOR_CMIP7_input_example.json index 244bd6713..eeaceb25f 100644 --- a/fre/tests/test_files/CMOR_CMIP7_input_example.json +++ b/fre/tests/test_files/CMOR_CMIP7_input_example.json @@ -53,9 +53,9 @@ "#_output": "***** Root directory for output (can be either a relative or full path) *****", "outpath": ".", "#_output_path_template": "***** Template for output path directory using tables keys or global attributes, these should follow the relevant data reference syntax *****", - "output_path_template": "", + "output_path_template": "", "#_output_file_template": "***** Template for output filename using tables keys or global attributes, these should follow the relevant data reference syntax *****", - "output_file_template": "[].nc", + "output_file_template": "", "#_INPUT_CONFIG_PATHS": "***** pathing/templates for input configuation files holding controlled vocabularies *****", "_controlled_vocabulary_file": "../tables-cvs/cmor-cvs.json", "_AXIS_ENTRY_FILE": "CMIP7_coordinate.json", diff --git a/fre/tests/test_files/CMOR_input_example.json b/fre/tests/test_files/CMOR_input_example.json index b7726c674..dc40b810d 100644 --- a/fre/tests/test_files/CMOR_input_example.json +++ b/fre/tests/test_files/CMOR_input_example.json @@ -20,7 +20,7 @@ "institution_id": "PCMDI", "source_id": "PCMDI-test-1-0", "calendar": "julian", - "grid": "regridded to FOO grid from native", + "grid": "FOO_BAR_PLACEHOLD", "grid_label": "gr", "nominal_resolution": "10000 km", "license": "CMIP6 model data produced by Lawrence Livermore PCMDI is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.", @@ -44,4 +44,4 @@ "#output_path_template": "Template for output path directory using tables keys or global attributes, these should follow the relevant data reference syntax", "output_path_template": "<_member_id>
", "output_file_template": "
<_member_id>" -} +} \ No newline at end of file diff --git a/fre/tests/test_files/cmip7-cmor-tables b/fre/tests/test_files/cmip7-cmor-tables index 1a31c4d48..da10ef064 160000 --- a/fre/tests/test_files/cmip7-cmor-tables +++ b/fre/tests/test_files/cmip7-cmor-tables @@ -1 +1 @@ -Subproject commit 1a31c4d48d4706e4f37dd39265736587b4ef9c89 +Subproject commit da10ef06409061a3405e6bfb5208fff741719738 diff --git a/fre/tests/test_fre_cmor_cli.py b/fre/tests/test_fre_cmor_cli.py index bf12c33ff..dfbfce0a7 100644 --- a/fre/tests/test_fre_cmor_cli.py +++ b/fre/tests/test_fre_cmor_cli.py @@ -317,9 +317,9 @@ def test_cli_fre_cmor_run_cmip7_case1(): cmor_creates_dir = \ f'CMIP/CanESM6-MR/esm-piControl/r3i1p1f3/sos/tavg-u-hxy-sea/{grid_label}' full_outputdir = \ - f"{outdir}/{cmor_creates_dir}/v{YYYYMMDD}" + f"{outdir}/{cmor_creates_dir}" full_outputfile = f"{full_outputdir}/" + \ - f"sos_tavg-u-hxy-sea_mon_glb_{grid_label}_CanESM6-MR_esm-piControl_variant_idtime_range_199301-199302.nc" + f"sos_tavg-u-hxy-sea_mon_glb_{grid_label}_CanESM6-MR_esm-piControl_r3i1p1f3_199301-199302.nc" # FYI/unneeded, this is mostly for reference filename = 'reduced_ocean_monthly_1x1deg.199301-199302.sos.nc' @@ -363,9 +363,9 @@ def test_cli_fre_cmor_run_cmip7_case2(): cmor_creates_dir = \ f'CMIP/CanESM6-MR/esm-piControl/r3i1p1f3/sos/tavg-u-hxy-sea/{grid_label}' full_outputdir = \ - f"{outdir}/{cmor_creates_dir}/v{YYYYMMDD}" + f"{outdir}/{cmor_creates_dir}" full_outputfile = f"{full_outputdir}/" + \ - f"sos_tavg-u-hxy-sea_mon_glb_{grid_label}_CanESM6-MR_esm-piControl_variant_idtime_range_199301-199302.nc" + f"sos_tavg-u-hxy-sea_mon_glb_{grid_label}_CanESM6-MR_esm-piControl_r3i1p1f3_199301-199302.nc" # FYI/unneeded, this is mostly for reference filename = 'reduced_ocean_monthly_1x1deg.199301-199302.sosV2.nc' @@ -604,3 +604,97 @@ def test_cli_fre_cmor_varlist_cmip7_table_filter(): assert 'sosV2' not in var_list, 'sosV2 should NOT be in the CMIP7-filtered list' output_varlist.unlink() + + +# fre cmor init +def test_cli_fre_cmor_init(): + ''' fre cmor init ''' + result = runner.invoke(fre.fre, args=["cmor", "init"]) + assert result.exit_code == 2 + +def test_cli_fre_cmor_init_help(): + ''' fre cmor init --help ''' + result = runner.invoke(fre.fre, args=["cmor", "init", "--help"]) + assert result.exit_code == 0 + +def test_cli_fre_cmor_init_opt_dne(): + ''' fre cmor init optionDNE ''' + result = runner.invoke(fre.fre, args=["cmor", "init", "optionDNE"]) + assert result.exit_code == 2 + + +def test_cli_fre_cmor_init_cmip6_exp_config(): + ''' + fre cmor init -- generate a CMIP6 experiment config template. + ''' + output_path = Path(f'{ROOTDIR}/test_cmip6_init_template.json') + if output_path.exists(): + output_path.unlink() + + result = runner.invoke(fre.fre, args=[ + "cmor", "init", + "--mip_era", "cmip6", + "--exp_config", str(output_path) + ]) + assert result.exit_code == 0, f'init failed: {result.output}' + assert output_path.exists(), 'output config was not created' + + with open(output_path, 'r', encoding='utf-8') as f: + config = json.load(f) + + assert config['mip_era'] == 'CMIP6' + assert config['_cmip6_option'] == 'CMIP6' + assert 'experiment_id' in config + assert 'output_path_template' in config + + output_path.unlink() + + +def test_cli_fre_cmor_init_cmip7_exp_config(): + ''' + fre cmor init -- generate a CMIP7 experiment config template. + ''' + output_path = Path(f'{ROOTDIR}/test_cmip7_init_template.json') + if output_path.exists(): + output_path.unlink() + + result = runner.invoke(fre.fre, args=[ + "cmor", "init", + "--mip_era", "cmip7", + "--exp_config", str(output_path) + ]) + assert result.exit_code == 0, f'init failed: {result.output}' + assert output_path.exists(), 'output config was not created' + + with open(output_path, 'r', encoding='utf-8') as f: + config = json.load(f) + + assert config['mip_era'] == 'CMIP7' + assert config['_cmip7_option'] == 1 + assert 'experiment_id' in config + assert 'output_path_template' in config + + output_path.unlink() + + +def test_cli_fre_cmor_init_default_name(): + ''' + fre cmor init -- when no --exp_config is given and no --tables_dir, + a default-named file should be created. + ''' + default_path = Path('CMOR_cmip6_template.json') + if default_path.exists(): + default_path.unlink() + + result = runner.invoke(fre.fre, args=[ + "cmor", "init", + "--mip_era", "cmip6" + ]) + assert result.exit_code == 0, f'init failed: {result.output}' + assert default_path.exists(), 'default output config was not created' + + with open(default_path, 'r', encoding='utf-8') as f: + config = json.load(f) + assert config['mip_era'] == 'CMIP6' + + default_path.unlink() diff --git a/meta.yaml b/meta.yaml index 56b556b42..0a837f087 100644 --- a/meta.yaml +++ b/meta.yaml @@ -30,20 +30,15 @@ requirements: - python>=3.11 - noaa-gfdl::analysis_scripts==0.0.1 - noaa-gfdl::catalogbuilder==2025.01.01 -# - noaa-gfdl::fre-nctools==2022.02.01 - - conda-forge::cdo>=2 - conda-forge::cftime - conda-forge::click>=8.2 - - conda-forge::cmor>=3.14 + - conda-forge::cmor>=3.14.1 - conda-forge::cylc-flow>=8.2 - conda-forge::cylc-rose - conda-forge::jinja2>=3 - conda-forge::jsonschema - conda-forge::metomi-rose - - conda-forge::nccmp -# - conda-forge::numpy==1.26.4 - conda-forge::numpy>=2 - - conda-forge::python-cdo - conda-forge::pyyaml - conda-forge::xarray>=2024.* - conda-forge::netcdf4>=1.7.* diff --git a/pylintrc b/pylintrc index ac870b9e3..f262c5469 100644 --- a/pylintrc +++ b/pylintrc @@ -163,7 +163,9 @@ class-const-naming-style=UPPER_CASE #class-const-rgx= # Naming style matching correct class names. -class-naming-style=PascalCase +# Allow PascalCase and mixedCase (e.g. xarrayTimeAverager) for class names +#class-naming-style=PascalCase +class-rgx=[a-zA-Z_][a-zA-Z0-9_]* # Regular expression matching correct class names. Overrides class-naming- # style. If left empty, class names will be checked with the set naming style. diff --git a/pyproject.toml b/pyproject.toml index 2664168dd..93190fd09 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -22,7 +22,6 @@ keywords = [ dependencies = [ 'analysis_scripts', 'catalogbuilder', - 'cdo', 'cftime', 'click', 'cmor', diff --git a/to_merge.sh b/to_merge.sh new file mode 100644 index 000000000..1dd8b14c8 --- /dev/null +++ b/to_merge.sh @@ -0,0 +1,10 @@ +#!/bin/bash + +#git merge origin/copilot/remove-replace-nccmp-from-tests +#git merge origin/copilot/remove-replace-nccmp-test-remap-pp-components +#git merge origin/copilot/remove-replace-nccmp-in-tests +#git merge origin/copilot/deprecate-cdo-python-cdo-use +#git merge origin/copilot/add-empty-user-experiment-config +#git merge origin/copilot/fix-cmor-error-cmip6-cmip7 +#git merge origin/copilot/fix-flexibility-calendar-input +git merge fork/frecmor-updates-for-cmip7