Skip to content

Fix ValueError in WACCM interpSigma when using NCAR/UCAR subset and concatenated datasets #11

@LeoLumos

Description

@LeoLumos

Description of the Problem
The default waccm.py download script handles global daily files perfectly, but global files are often too large, and older forecast dates are unavailable via that method. To get around this, users (like myself) often request spatial subsets and monthly concatenated files through the NCAR/UCAR subsetting tools (e.g., CESM Subset or WACCM Download).
However, using these custom subset files in aqmbc causes the following crash during boundary condition generation:
ValueError: could not broadcast input array from shape (124,35,686) into shape ()
Root Cause
Tools like NCO used by the NCAR servers to crop and merge the files alter the NetCDF dimensions.
Proposed Solution
I have modified the interpSigma function in aqmbc/models/waccm.py to make it robust against these altered dimensions.
Here is the updated function. I have tested it on a full month of subset data and it works perfectly:

def interpSigma(self, vglvls, vgtop=None, interptype='linear',
                    extrapolate=False, fill_value='extrapolate',
                    verbose=0):
        from PseudoNetCDF.coordutil import sigma2coeff
        import numpy as np
        import PseudoNetCDF as pnc

        psfc = self.variables['PS'][...][:, None, ...]
        slicer = [None] * psfc.ndim
        slicer[1] = slice(None)
        slicer = tuple(slicer)

        # 1. Safely handle P0 (compatible with 0-D scalars)
        if 'P0' in self.variables:
            p0 = float(np.atleast_1d(self.variables['P0'][...])[0])
        else:
            p0 = 100000.0

        # 2. Safely handle hybrid coefficients (strip erroneous dimensions)
        def get_1d(varname):
            val = self.variables[varname][...]
            return val[0] if getattr(val, 'ndim', 0) > 1 else val

        hyam = get_1d('hyam')[slicer]
        hybm = get_1d('hybm')[slicer]
        hyai = get_1d('hyai')[slicer]
        hybi = get_1d('hybi')[slicer]

        pmid = (psfc * hybm + p0 * hyam)
        pedges = (psfc * hybi + p0 * hyai)

        newshape = list(pmid.shape)
        newshape.insert(2, len(vglvls) - 1)
        itershape = [newshape[0]] + newshape[3:]

        with np.errstate(divide='ignore', invalid='ignore'):
            sigma = (pedges - vgtop) / (psfc - vgtop)
        sigma = np.nan_to_num(sigma, nan=0.0, posinf=0.0, neginf=0.0)

        tmpv = np.zeros(newshape, dtype='f')
        for idx in np.ndindex(*itershape):
            srcidx = (idx[0], slice(None)) + tuple(idx[1:])
            destidx = (idx[0], slice(None), slice(None)) + tuple(idx[1:])
            fromvglvls = sigma[srcidx]
            tmpv[destidx] = sigma2coeff(
                fromvglvls[::-1], vglvls
            )[::-1]

        pweight = tmpv[...] * pmid[:, :, None, ...]
        pnorm = pweight.sum(1)
        pnorm = np.where(pnorm == 0, 1e-10, pnorm)

        # 3. Filter variables compatible with both ICON and BCON
        exprkeys = [
            key for key, var in self.variables.items()
            if len(var.dimensions) >= 2 and var.dimensions[:2] == ('time', 'lev')
        ]

        # 4. Manually construct the output file to avoid broadcast errors from from_ncvs
        outf = pnc.PseudoNetCDFFile()

        for pk in self.ncattrs():
            outf.setncattr(pk, self.getncattr(pk))

        for key, dim in self.dimensions.items():
            if key == 'lev':
                outf.createDimension('lev', len(vglvls) - 1)
            elif key == 'ilev':
                outf.createDimension('ilev', len(vglvls))
            else:
                outf.copyDimension(dim, key=key)

        for key in exprkeys:
            var_data = (self.variables[key][...][:, :, None, ...] * pweight).sum(1) / pnorm
            orig_dims = self.variables[key].dimensions
            
            # Explicitly declare variable type and dimensions
            outvar = outf.createVariable(key, 'f', orig_dims)
            outvar[...] = var_data
            
            ov = self.variables[key]
            outvar.setncatts({k: ov.getncattr(k) for k in ov.ncattrs()})

        for key, var in self.variables.items():
            if (
                key not in exprkeys
                and key not in self.dimensions
                and key not in ('hyam', 'hybm', 'hyai', 'hybi')
            ):
                # Filter out deprecated vertical layer variables
                if 'lev' in var.dimensions or 'ilev' in var.dimensions:
                    continue
                outf.copyVariable(var, key=key)

        outf.VGLVLS = vglvls
        outf.VGTOP = vgtop
        return outf

Hopefully, this helps anyone else trying to process regional NCAR subset data!
While this modification successfully allows the script to output the ICON and BCON files without crashing, I am not yet 100% certain if the resulting files are scientifically valid for the model. It likely requires further validation.
I also noticed an interesting discrepancy: the single-day BCON files generated by the original script are significantly larger in file size compared to the BCON files generated by my modified script. However, when inspecting them with ncdump and ncview, the variables and dimensions appear to be the same without any obvious differences.
Could you please take a look at my modified interpSigma function when you have a moment to see if my approach is completely correct and safe to use? I would really appreciate your expertise on why the file size might have shrunk!
Thank you for your time and for maintaining this great tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions