diff --git a/.github/workflows/pace_tests.yaml b/.github/workflows/pace_tests.yaml
index 2a3fd66f..881bfabf 100644
--- a/.github/workflows/pace_tests.yaml
+++ b/.github/workflows/pace_tests.yaml
@@ -10,7 +10,7 @@ on:
 
 jobs:
   pace_main_tests:
-    uses: NOAA-GFDL/pace/.github/workflows/main_unit_tests.yaml@develop
+    uses: FlorianDeconinck/pace/.github/workflows/main_unit_tests.yaml@feature/data_dimensions_field
     with:
       component_trigger: true
       component_name: pyFV3
diff --git a/.github/workflows/pyshield_tests.yaml b/.github/workflows/pyshield_tests.yaml
index 2ea3e44a..67d410bc 100644
--- a/.github/workflows/pyshield_tests.yaml
+++ b/.github/workflows/pyshield_tests.yaml
@@ -10,7 +10,7 @@ on:
 
 jobs:
   pyshield_translate_tests:
-    uses: NOAA-GFDL/PySHiELD/.github/workflows/translate.yaml@develop
+    uses: FlorianDeconinck/PySHiELD/.github/workflows/translate.yaml@update/ndsl_without_tracers_list
     with:
       component_trigger: true
       component_name: pyFV3
diff --git a/.github/workflows/translate.yaml b/.github/workflows/translate.yaml
index be1a9f67..58d1fa9d 100644
--- a/.github/workflows/translate.yaml
+++ b/.github/workflows/translate.yaml
@@ -49,8 +49,9 @@ jobs:
         uses: actions/checkout@v6
         with:
           submodules: 'recursive'
-          repository: NOAA-GFDL/pyFV3
+          repository: FlorianDeconinck/pyFV3
           path: pyFV3
+          ref: feature/tracers_to_ddims_field
 
       - name: Checkout hash that triggered CI
         uses: actions/checkout@v6
diff --git a/ORIG.README.md b/ORIG.README.md
new file mode 100644
index 00000000..a6192aac
--- /dev/null
+++ b/ORIG.README.md
@@ -0,0 +1,459 @@
+> DISCLAIMER: Work in progress
+
+# FV3core
+
+FV3core is a Python version, using GridTools GT4Py with CPU and GPU backend options, of the FV3 dynamical core (fv3gfs-fortran repo).
+The code here includes regression test data of computation units coming from serialized output from the Fortran model generated using the `GridTools/serialbox` framework.
+
+As of January 10, 2021 this documentation is outdated in that it was written when we had fv3core as its own single repository. Some functionality, such as linting, has been moved to the top level but may still be described in this document as occuring inside the fv3core folder.
+
+**WARNING** This repo is under active development and relies on code and data that is not publicly available at this point.
+
+## QuickStart
+
+1. Ensure you have docker installed and available for building and running and has access to the VCM cloud
+
+Be sure to complete any required post-installation instructions (e.g. [for linux](https://docs.docker.com/engine/install/linux-postinstall/)). Also [authorize Docker to pull from gcr](https://cloud.google.com/container-registry/docs/advanced-authentication). Your user will need to have read access to the `us.gcr.io/vcm-ml` repository.
+
+2.  You can build the image, download the data, and run the tests using:
+
+```shell
+$ make tests savepoint_tests savepoint_tests_mpi
+```
+
+If you want to develop code, you should also install the linting requirements and git hooks locally
+
+```shell
+$ pip install -c constraints.txt -r requirements/requirements_lint.txt
+$ pre-commit install
+
+## Getting started, in more detail
+If you want to build the main fv3core docker image, run
+
+```shell
+$ make build
+```
+
+If you want to download test data run
+
+```shell
+$ make get_test_data
+```
+
+And the c12_6ranks_standard data will download into the `test_data` directory.
+
+If you do not have a GCP account, there is an option to download basic test data from a public FTP server and you can skip the GCP authentication step above. To download test data from the FTP server, use `make USE_FTP=yes get_test_data` instead and this will avoid fetching from a GCP storage bucket. You will need a valid in stallation of the `lftp` command.
+
+MPI parallel tests (that run that way to exercise halo updates in the model) can also be run with:
+
+```shell
+$ make savepoint_tests_mpi
+```
+
+The environment image that the fv3core container uses is prebuilt and lives in the GCR. The above commands will by default pull this image before building the fv3core image and running the tests.
+To build the environment from scratch (including GT4py) before running tests, either run
+
+```
+make build_environment
+```
+
+or
+
+```shell
+$ PULL=False make savepoint_tests
+```
+
+which will execute the target `build_environment` for you before running the tests.
+
+There are `push_environment` and `rebuild_environment` targets, but these should normally not be done manually. Updating the install image should only be done by Jenkins after the tests pass using a new environment.
+
+### Test data options
+
+If you want to run different test data, discover the possible options with
+```shell
+$ make list_test_data_options
+```
+This will list the storage buckets in the cloud. Then to run one of them, set EXPERIMENT to the folder name of the data you'd like to use:
+
+e.g.
+```shell
+$EXPERIMENT=c48_6ranks_standard make tests
+```
+
+If you choose an experiment with a different number of ranks than 6, also set `NUM_RANKS=<num ranks>`
+
+## Testing interactively outside the container
+
+After `make savepoint_tests` has been run at least once (or you have data in test_data and the docker image fv3core exists because `make build` has been run), you can iterate on code changes using
+
+```shell
+$ DEV=y make savepoint_tests
+```
+or for the parallel or non-savepoint tests:
+
+```shell
+$ DEV=y make tests savepoint_tests_mpi
+```
+These will mount your current code into the fv3core container and run it rather than the code that was built when `make build` ran.
+
+## Running tests inside a container
+
+If you to prefer to work interactively inside the fv3core container, get the test data and build the docker image (see above if you do not have a GCP account and want to get test data):
+```shell
+$ make get_test_data
+```
+
+```shell
+$ make build
+```
+Testing can be run with this data from `/port_dev` inside the container:
+
+```shell
+$ make dev
+```
+
+Then in the container:
+
+```shell
+$ pytest -v -s --data_path=/test_data/ /port_dev/tests --which_modules=<stencil name>
+```
+The 'stencil name' can be determined from the associated Translate class. e.g. TranslateXPPM is a test class that translate data serialized from a run of the fortran model, and 'XPPM' is the name you can use with --which_modules.
+
+
+
+
+### Test options
+
+All of the make endpoints involved running tests can be prefixed with the `TEST_ARGS` environment variable to set test options or pytest CLI args (see below) when running inside the container.
+
+* `--which_modules <modules to run tests for>` - comma separated list of which modules to test (defaults to running all of them).
+
+* `--print_failures` - if your test fails, it will only report the first datapoint. If you want all the nonmatching regression data to print out (so you can see if there are patterns, e.g. just incorrect for the first 'i' or whatever'), this will print out for every failing test all the non-matching data.
+
+* `--failure_stride` - when printing failures, print every n failures only.
+
+* `--data_path` - path to where you have the `Generator*.dat` and `*.json` serialization regression data. Defaults to current directory.
+
+* `--backend` - which backend to use for the computation. Options: `[numpy, gt:cpu_ifirst, gt:cpu_first, gt:gpu, cuda]`. Defaults to `numpy`.
+* `--python_regression` - Run the tests that have Python based regression data. Only applies to running parallel tests (savepoint_tests_mpi)
+Pytest provides a lot of options, which you can see by `pytest --help`. Here are some
+common options for our tests, which you can add to `TEST_ARGS`:
+
+* `-r` - is used to report test types other than failure. It can be provided `s` for skipped (e.g. tests which were not run because earlier tests of the same stencil failed), `x` for xfail or "expected to fail" tests (like tests with no translate class), or `p` for pass. For example, to report skipped and xfail tests you would use `-rsx`.
+
+* `--disable-warnings` - will stop all warnings from being printed at the end of the tests, for example warnings that translate classes are not yet implemented.
+
+* `-v` - will increase test verbosity, while `-q` will decrease it.
+
+* `-s` - will let stdout print directly to console instead of capturing the output and printing it when a test fails only. Note that logger lines will always be printed both during (by setting log_cli in our pytest.ini file) and after tests.
+
+* `-m` - will let you run only certain groups of tests. For example, `-m=parallel` will run only parallel stencils, while `-m=sequential` will run only stencils that operate on one rank at a time.
+
+* `--threshold_overrides_file` - will read a yaml file with error thresholds specified for specific backend and platform (docker or metal) configurations, overriding the max_error thresholds defined in the Translate classes. Format of the yaml file is described [here](tests/savepoint/translate/overrides/README.md).
+
+* `--dperiodic` - run tests on a doubly-periodic domain. Will look for only one tile's worth of test data and parallel tests will be run with a TileCommunicator instead of a CubedSphereCommunicator.
+
+**NOTE:** FV3 is current assumed to be by default in a "development mode", where stencils are checked each time they execute for code changes (which can trigger regeneration). This process is somewhat expensive, so there is an option to put FV3 in a performance mode by telling it that stencils should not automatically be rebuilt:
+
+```shell
+$ export FV3_STENCIL_REBUILD_FLAG=False
+```
+
+## Porting a new stencil
+
+1. Find the location in the fv3gfs-fortran repo code where the save-point is to be added, e.g. using
+
+```shell
+$ git grep <stencil_name> <checkout of fv3gfs-fortran>
+```
+
+2. Create a `translate` class from the serialized save-point data to a call to the stencil or function that calls the relevant stencil(s).
+
+These are usually named `tests/savepoint/translate/translate_<lowercase name>`
+
+Import this class in the `tests/savepoint/translate/__init__.py` file
+
+3. Write a Python function wrapper that the translate function (created above) calls.
+
+By convention, we name these `fv3core/stencils/<lower case stencil name>.py`
+
+4. Run the test, either with one name or a comma-separated list
+
+```shell
+$ make dev_tests TEST_ARGS="-–which_modules=<stencil name(s)>"
+```
+
+**Please also review the [Porting conventions](#porting-conventions) section for additional explanation**
+## Installation
+
+### Docker Image
+
+To build the `us.gcr.io/vcm-ml/fv3core` image with required dependencies for running the Python code, run
+
+```shell
+$ make build
+```
+
+Add `PULL=False` to build from scratch without running `docker pull`:
+
+```shell
+PULL=False make build
+```
+
+## Relevant repositories
+
+- https://github.com/GridTools/serialbox -
+  Serialbox generates serialized data when the Fortran model runs and has bindings to manage data from Python
+
+- https://github.com/VulcanClimateModeling/fv3gfs-fortran -
+  This is the existing Fortran model decorated with serialization statements from which the test data is generated
+
+- https://github.com/GridTools/gt4py -
+  Python package for the DSL language
+
+- https://github.com/VulcanClimateModeling/util
+  Python specific model functionality, such as halo updates.
+
+- https://github.com/VulcanClimateModeling/fv3gfs-wrapper
+  A Python based wrapper for running the Fortran version of the FV3GFS model.
+
+Some of these are submodules.
+While tests can work without these, it may be necessary for development to have these as well.
+To add these to the local repository, run
+
+```shell
+$ git submodule update --init
+```
+
+The submodules include:
+
+- `external/util` - git@github.com:VulcanClimateModeling/util.git
+- `external/daint_venv` -  git@github.com:VulcanClimateModeling/daint_venv.git
+
+## Dockerfiles and building
+
+There are two main docker files:
+
+1. `docker/dependencies.Dockerfile` - defines dependency images such as for mpi, serialbox, and GT4py
+
+2. `docker/Dockerfile` - uses the dependencies to define the final fv3core images.
+
+The dependencies are separated out into their own images to expedite rebuilding the docker image without having to rebuild dependencies, especially on CI.
+
+For the commands below using `make -C docker`, you can alternatively run `make` from within the `docker` directory.
+
+These dependencies can be updated, pushed, and pulled with `make -C docker build_deps`, `make -C docker push_deps`, and `make -C docker pull_deps`. The tag of the dependencies is based on the tag of the current build in the Makefile, which we will expand on below.
+
+Building from scratch requires both a deps and build command, such as `make -C docker pull_deps fv3core_image`.
+
+If any example fails for "pulled dependencies", it means the dependencies have never been built. You can
+build them and push them to GCR with:
+
+```shell
+$ make -C docker build_deps push_deps
+```
+
+### Building examples
+
+fv3core image with pulled dependencies:
+
+```shell
+$ make -C docker pull_deps fv3core_image
+```
+
+CUDA-enabled fv3core image with pulled dependencies:
+```
+$ CUDA=y make -C docker pull_deps fv3core_image
+```
+
+fv3core image with locally-built dependencies:
+```shell
+$ make -C docker build_deps fv3core_image
+```
+
+### Updating Serialbox
+
+If you need to install an updated version of Serialbox, you must first install cmake into the development environment. To install an updated version of Serialbox from within the container run
+
+```shell
+$ wget https://github.com/Kitware/CMake/releases/download/v3.17.3/cmake-3.17.3.tar.gz && \
+  tar xzf cmake-3.17.3.tar.gz && \
+  cd cmake-3.17.3 && \
+  ./bootstrap && make -j4 && make install
+$ git clone -b v2.6.1 --depth 1 https://github.com/GridTools/serialbox.git /tmp/serialbox
+$ cd /tmp/serialbox
+$ cmake -B build -S /tmp/serialbox -DSERIALBOX_USE_NETCDF=ON -DSERIALBOX_TESTING=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/serialbox
+$ cmake --build build/ -j $(nproc) --target install
+$ cd -
+$ rm -rf build /tmp/serialbox
+```
+
+## Pinned dependencies
+
+Dependencies are pinned using `constraints.txt`. This is auto-generated by pip-compile from the `pip-tools` package, which reads `requirements.txt` and `requirements/requirements_lint.txt`, determines the latest versions of all dependencies (including recursive dependencies) compatible those files, and writes pinned versions for all dependencies. This can be updated using:
+
+```shell
+$ make constraints.txt
+```
+
+This file is committed to the repository, and gives more reproducible tests if an old commit of the repository is checked out in the future. The constraints are followed when creating the `fv3core` docker images. To ensure consistency this should ideally be run from inside a docker development environment, but you can also run it on your local system with an appropriate Python 3 environment.
+
+## Development
+
+To develop fv3core, you need to install the linting requirements in `requirements/requirements_lint.txt`. To install the pinned versions, use:
+
+```shell
+$ pip install -c constraints.txt -r requirements/requirements_lint.txt
+```
+
+This adds `pre-commit`, which we use to lint and enforce style on the code. The first time you install `pre-commit`, install its git hooks using:
+
+```shell
+$ pre-commit install
+pre-commit installed at .git/hooks/pre-commit
+```
+
+As a convenience, the `lint` target of the top-level makefile executes `pre-commit run --all-files`.
+Linting, which formats files and checks for some style conventions, is required, as the same checks are the first step in the continuous integration testing that happens when creating a pull request.
+Linting locally saves time and literal energy, since CI tests do not have to be launched so many times!
+
+ Please see the 'Development Guidelines' below for more information on the structure of the code to align your new code with the current conventions, as well as the CONTRIBUTING.md document for style guidelines.
+
+## GT4Py version
+
+FV3Core does not actually use the [GridTools/gt4py](https://github.com/gridtools/gt4py) main, it instead uses a Vulcan Climate Modeling development branch.
+This is publically available version at [VCM/gt4py](https://github.com/vulcanclimatemodeling/gt4py).
+
+Situation: There is a new stable feature in a gt4py PR, but it is not yet merged into the GridTools/gt4py main branch.
+[branches.cfg](https://github.com/VulcanClimateModeling/gt4py/blob/develop/branches.cfg) lists these features.
+Steps:
+
+1. Add any new branches to `branches.cfg`
+2. Rebuild the develop branch, either:
+  a. `make_develop gt4py-dev path/to/branches.cfg` (you may have to resolve conflicts...)
+  b. Adding new commits on top of the existing develop branch (e.g. merge or cherry-pick)
+3. Force push to the develop branch: `git push -f upstream develop`
+
+The last step will launch Jenkins tests. If these pass:
+
+1. Create a git tag: `git tag v-$(git rev-parse --short HEAD)`
+2. Push the tag: `git push upstream --tags`
+3. Make a PR to [VCM/gt4py](https://github.com/vulcanclimatemodeling/fv3core) that updates the version in `docker/Makefile` to the new tag.
+
+## License
+FV3Core is provided under the terms of the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html) license.
+
+# Development guidelines
+
+## File structure / conventions
+The main functionality of the FV3 dynamical core, which has been ported from the Fortran version in the fv3gfs-fortran repo, is defined using GT4py stencils and python 'compute' functions in fv3core/stencils. The core is comprised of units of calculations defined for regression testing. These were initially generally separated into distinct files in fv3core/stencils with corresponding files in tests/savepoint/translate/translate_<unit>.py defining the translation of variables from Fortran to Python. Exceptions exist in cases where topical and logical grouping allowed for code reuse. As refactors optimize the model, these units may be merged to occupy the same files and even methods/stencils, but the units should still be tested separately, unless determined to be redundant.
+
+The core has most of its calculations happening in GT4py stencils, but there are still several instances of operations happening in Python directly, which will need to be replaced with GT4py code for optimal performance.
+
+The namelist and grid are global variables defined in fv3core/_config.py The namelist is 'flattened' so that the grouping name of the option is not required to access the data (we may want to change this).
+
+The grid variables are mostly 2d variables and are 'global' to the model thread per mpi rank. The grid object also contains domain and layout information relevant to the current rank being operated on.
+
+Utility functions in `fv3core/utils/` include:
+  - `gt4py_utils.py`:
+    - default gt4py and model settings
+    - methods for generating gt4py storages
+    - methods for using numpy and cupy arrays in python functions that have not been put into GT4py
+    - methods for handling complex patterns that did not immediately map to gt4py, and will mostly be removed with future refactors (e.g. k_split_run)
+    - some general model math computations (e.g. great_circle_dist), that will eventually be put into gt4py with a future refactor
+  - `grid.py`:
+    - A Grid class definition that provides information about the grid layout, current tile informationm access to grid variables used globally, and convenience methods related to tile indexing, origins and domains commonly used
+    - A grid is defined for each MPI rank (minimum 6 ranks, 1 for each tile face of the cubed sphere grid represnting the whole Earth)
+    - Also provides functionality for generating a Quantity object used for halo updates and other utilities
+  - `corners`: port of corner calculations, initially direct Python calculations, being replaced with GT4py gtscript functions as the GT4py regions feature is implemented
+  - `mpi.py`: a wrapper for importing mpi4py when available
+  - `global_constants.py`: constants for use throughout the model
+  - `typing.py`: Clean names for common types we use in the model. This is new and
+    hasn't been adopted throughout the model yet, but will eventually be our
+    standard. A shorthand 'sd' has been used in the intial version.
+
+The `tests/` directory currently includes a framework for translating fields serialized (using
+Serialbox from GridTools) from a Fortran run into gt4py storages that can be inputs to
+fv3core unit computations, and compares the results of the ported code to serialized
+data following a unit computation.
+
+The `docker/` directory provides Dockerfiles for building a repeatable environment in which
+to run the core
+
+The `external/` directory is for submoduled repos that provide essential functionality
+
+The build system uses Makefiles following the convention of other repos within VulcanClimateModeling.
+
+## Model Interface
+
+The top level functions fv_dynamics and fv_sugridz can currenty only be run in parallel using mpi with a minimum of 6 ranks (there are a few other units that also require this, e.g. whenever there is a halo update involved in a unit). These are the interface to the rest of the model and currently have different conventions than the rest of the model.
+ - A 'state' object (currently a SimpleNamespace) stores pointers to the allocated data fields
+ - Most functions within dyn_core can be run sequentially per rank
+ - Currently a list of ArgSpecs must decorate an interface function, where each ArgSpec provides useful information about the argument, e.g.: `@state_inputs( ArgSpec("qvapor", "specific_humidity", "kg/kg", intent="inout")`
+   - The format is (fortran_name, long_name, units, intent)
+   - We currently provide a duplicate of most of the metadata in the specification of the unit test, but that may be removed eventually.
+ - Then the function itself, e.g. fv_dynamics, has arguments of 'state', 'comm' (the communicator) and all of the scalar parameters being provided.
+
+### Porting conventions
+
+Generation of regression data occurs in the fv3gfs-fortran repo (https://github.com/VulcanClimateModeling/fv3gfs-fortran) with serialization statements and a build procedure defined in `tests/serialized_test_data_generation`. The version of data this repo currently tests against is defined in `FORTRAN_SERIALIZED_DATA_VERSION` in this repo's `docker/Makefile.image_names`. Fields serialized are defined in Fortran code with serialization comment statements such as:
+
+```
+    !$ser savepoint C_SW-In
+    !$ser data delpcd=delpc delpd=delp ptcd=ptc
+```
+
+where the name being assigned is the name the fv3core uses to identify the variable in the test code. When this name is not equal to the name of the variable, this was usually done to avoid conflicts with other parts of the code where the same name is used to reference a differently sized field.
+
+The majority of the logic for translating from data serialized from Fortran to something that can be used by Python, and the comparison of the results, is encompassed by the main Translate class in the tests/savepoint/translate/translate.py file. Any units not involving a halo update can be run using this framework, while those that need to be run in parallel can look to the ParallelTranslate class as the parent class in tests/savepoint/translate/parallel_translate.py. These parent classes provide generally useful operations for translating serialized data between Fortran and Python specifications, and for applying regression tests.
+
+A new unit test can be defined as a new child class of one of these, with a naming convention of `Translate<Savepoint Name>` where `Savepoint Name` is the name used in the serialization statements in the Fortran code, without the `-In` and `-Out` part of the name. A translate class can usually be minimally specify the input and output fields. Then, in cases where the parent compute function is insuffient to handle the complexity of either the data translation or the compute function, the appropriate methods can be overridden.
+
+For Translate objects
+  - The init function establishes the assumed translation setup for the class, which can be dynamically overridden as needed.
+  - the parent compute function does:
+    - Makes gt4py storages of the max shape (grid.npx+1, grid.npy+1, grid.npz+1) aligning the data based on the start indices specified. (gt4py requires data fields have the same shape, so in this model we have buffer points so all calculations can be done easily without worrying about shape matching).
+    - runs the compute function (defined in self.compute_func) on the input data storages
+    - slices the computed Python fields to be compared to fortran regression data
+  - The unit test then uses a modified relative error metric to determine whether the unit passes
+  - The init method for a Translate class:
+    - The input (self.in_vars["data_vars"]) and output(self.out_vars) variables are specified in dictionaries, where the keys are the name of the variable used in the model and the values are dictionaries specifying metadata for translation of serialized data to gt4py storages. The metadata that can be specied to override defaults are:
+    - Indices to line up data arrays into gt4py storages (which all get created as the max possible size needed by all operations, for simplicity): "istart", "iend", "jstart", "jend", "kstart", "kend". These should be set using the 'grid' object available to the Translate object, using equivalent index names as in the declaration of variables in the Fortran code, e.g. real:: cx(bd%is:bd%ie+1,bd%jsd:bd%jed ) means we should assign. Example:
+
+```python
+      self.in_vars["data_vars"]["cx"] = {"istart": self.is\_, "iend": self.ie + 1,
+                                         "jstart": self.jsd, "jend": self.jed,}
+```
+  - There is only a limited set of Fortran shapes declared, so abstractions defined in the grid can also be used,
+    e.g.: `self.out_vars["cx"] = self.grid.x3d_compute_domain_y_dict()`. Note that the variables, e.g. `grid.is\_` and `grid.ie` specify the 'compute' domain in the x direction of the current tile, equivalent to `bd%is` and `bd%ie` in the Fortran model EXCEPT that the Python variables are local to the current MPI rank (a subset of the tile face), while the Fortran values are global to the tile face. This is because these indices are used to slice into fields, which in Python is 0-based, and in Fortran is based on however the variables are declared. But, for the purposes of aligning data for computations and comparisons, we can match them in this framework. Shapes need to be defined in a dictionary per variable including `"istart"`, `"iend"`, `"jstart"`, `"jend"`, `"kstart"`, `"kend"` that represent the shape of that variable as defined in the Fortran code. The default shape assumed if a variable is specified with an empty dictionary is `isd:ied, jsd:jed, 0:npz - 1` inclusive, and variables that aren't that shape in the Fortran code need to have the 'start' indices specified for the in_vars dictionary , and 'start' and 'end' for the out_vars.
+    - `"serialname"` can be used to specify a name used in the Fortran code declaration if we'd like the model to use a different name
+    - `"kaxis"`: which dimension is the vertical direction. For most variables this is '2' and does not need to be specified. For Fortran variables that assign the vertical dimension to a different axis, this can be set to ensure we end up with 3d storages that have the vertical dimension where it is expected by GT4py.
+    - `"dummy_axes"`: If set this will set of the storage to have singleton dimensions in the axes defined. This is to enable testing stencils where the full 3d data has not been collected and we want to run stencil tests on the data for a particular slice.
+    - `"names_4d"`: If a 4d variable is being serialized, this can be set to specify the names of each 3d field. By default this is the list of tracers.
+    - input variables that are scalars should be added to `self.in_vars["parameters"]`
+    - `self.compute_func` is the name of the model function that should be run by the compute method in the translate class
+    - `self.max_error` overrides the parent classes relative error threshold. This should only be changed when the reasons for non-bit reproducibility are understood.
+    - `self.max_shape` sets the size of the gt4py storage created for testing
+    - `self.ignore_near_zero_errors[<varname>] = True`: This is an option to let some fields pass with higher relative error if the absolute error is very small
+    - `self.skip_test`: This is an option to jump over the test case, to be used in the override file for temporary deactivation of tests.
+
+For `ParallelTranslate` objects:
+  - Inputs and outputs are defined at the class level, and these include metadata such as the "name" (e.g. understandable name for the symbol), dimensions, units and n_halo(numb er of halo lines)
+  - Both `compute_sequential` and `compute_parallel` methods may be defined, where a mock communicator is used in the `compute_sequential` case
+  - The parent assumes a state object for tracking fields and methods exist for translating from inputs to a state object and extracting the output variables from the state. It is assumed that Quantity objects are needed in the model method in order to do halo updates.
+  - `ParallelTranslate2Py` is a slight variation of this used for many of the parallel units that do not yet utilize a state object and relies on the specification of the same index metadata of the Translate classes
+  - `ParallelTranslateBaseSlicing` makes use of the state but relies on the Translate object of self._base, a Translate class object, to align the data before making quantities, computing and comparing.
+
+### Debugging Tests
+
+Pytest can be configured to give you a pdb session when a test fails. To route this properly through docker, you can run:
+
+```bash
+TEST_ARGS="-v -s --pdb" RUN_FLAGS="--rm -it" make tests
+```
+
+This can be done with any pytest target, such as `make savepoint_tests` and `make savepoint_tests_mpi`.
+
+### GEOS API
+
+The `GeosDycoreWrapper` class provides an API to run the dynamical core in a Python component of a GEOS model run. A `GeosDycoreWrapper` object is initialized with a namelist, communicator, and backend, which creates the communicators, partitioners, dycore state, and dycore object required to run the Pace dycore. A wrapper object takes numpy arrays of `u, v, w, delz, pt, delp, q, ps, pe, pk, peln, pkz, phis, q_con, omga, ua, va, uc, vc, mfxd, mfyd, cxd, cyd,` and `diss_estd` and returns a dictionary containing numpy arrays of those same variables. Wrapper objects contain a `timer` attrubite that tracks the amount of time moving input data to the dycore state, running the dynamical core, and retrieving the data from the state.
diff --git a/README.md b/README.md
index 6323fd5d..a961837b 100644
--- a/README.md
+++ b/README.md
@@ -1,458 +1,51 @@
-> DISCLAIMER: Work in progress
-
-# FV3core
-
-FV3core is a Python version, using NDSL with CPU and GPU backend options, of the FV3 dynamical core (fv3gfs-fortran repo).
-The code here includes regression test data of computation units coming from serialized output from the Fortran model generated using the `GridTools/serialbox` framework.
-
-As of January 10, 2021 this documentation is outdated in that it was written when we had fv3core as its own single repository. Some functionality, such as linting, has been moved to the top level but may still be described in this document as occuring inside the fv3core folder.
-
-**WARNING** This repo is under active development and relies on code and data that is not publicly available at this point.
-
-## QuickStart
-
-1. Ensure you have docker installed and available for building and running and has access to the VCM cloud
-
-Be sure to complete any required post-installation instructions (e.g. [for linux](https://docs.docker.com/engine/install/linux-postinstall/)). Also [authorize Docker to pull from gcr](https://cloud.google.com/container-registry/docs/advanced-authentication). Your user will need to have read access to the `us.gcr.io/vcm-ml` repository.
-
-2.  You can build the image, download the data, and run the tests using:
-
-```shell
-$ make tests savepoint_tests savepoint_tests_mpi
-```
-
-If you want to develop code, you should also install the linting requirements and git hooks locally
-
-```shell
-$ pip install -c constraints.txt -r requirements/requirements_lint.txt
-$ pre-commit install
-
-## Getting started, in more detail
-If you want to build the main fv3core docker image, run
-
-$ make build
-```
-
-If you want to download test data run
-
-```shell
-$ make get_test_data
-```
-
-And the c12_6ranks_standard data will download into the `test_data` directory.
-
-If you do not have a GCP account, there is an option to download basic test data from a public FTP server and you can skip the GCP authentication step above. To download test data from the FTP server, use `make USE_FTP=yes get_test_data` instead and this will avoid fetching from a GCP storage bucket. You will need a valid in stallation of the `lftp` command.
-
-MPI parallel tests (that run that way to exercise halo updates in the model) can also be run with:
-
-```shell
-$ make savepoint_tests_mpi
-```
-
-The environment image that the fv3core container uses is prebuilt and lives in the GCR. The above commands will by default pull this image before building the fv3core image and running the tests.
-To build the environment from scratch (including GT4py) before running tests, either run
-
-```
-make build_environment
-```
-
-or
-
-```shell
-$ PULL=False make savepoint_tests
-```
-
-which will execute the target `build_environment` for you before running the tests.
-
-There are `push_environment` and `rebuild_environment` targets, but these should normally not be done manually. Updating the install image should only be done by Jenkins after the tests pass using a new environment.
-
-### Test data options
-
-If you want to run different test data, discover the possible options with
-```shell
-$ make list_test_data_options
-```
-This will list the storage buckets in the cloud. Then to run one of them, set EXPERIMENT to the folder name of the data you'd like to use:
-
-e.g.
-```shell
-$EXPERIMENT=c48_6ranks_standard make tests
-```
-
-If you choose an experiment with a different number of ranks than 6, also set `NUM_RANKS=<num ranks>`
-
-## Testing interactively outside the container
-
-After `make savepoint_tests` has been run at least once (or you have data in test_data and the docker image fv3core exists because `make build` has been run), you can iterate on code changes using
-
-```shell
-$ DEV=y make savepoint_tests
-```
-or for the parallel or non-savepoint tests:
-
-```shell
-$ DEV=y make tests savepoint_tests_mpi
-```
-These will mount your current code into the fv3core container and run it rather than the code that was built when `make build` ran.
-
-## Running tests inside a container
-
-If you to prefer to work interactively inside the fv3core container, get the test data and build the docker image (see above if you do not have a GCP account and want to get test data):
-```shell
-$ make get_test_data
-```
-
-```shell
-$ make build
-```
-Testing can be run with this data from `/port_dev` inside the container:
-
-```shell
-$ make dev
-```
-
-Then in the container:
-
-```shell
-$ pytest -v -s --data_path=/test_data/ /port_dev/tests --which_modules=<stencil name>
-```
-The 'stencil name' can be determined from the associated Translate class. e.g. TranslateXPPM is a test class that translate data serialized from a run of the fortran model, and 'XPPM' is the name you can use with --which_modules.
-
-
-
-
-### Test options
-
-All of the make endpoints involved running tests can be prefixed with the `TEST_ARGS` environment variable to set test options or pytest CLI args (see below) when running inside the container.
-
-* `--which_modules <modules to run tests for>` - comma separated list of which modules to test (defaults to running all of them).
-
-* `--print_failures` - if your test fails, it will only report the first datapoint. If you want all the nonmatching regression data to print out (so you can see if there are patterns, e.g. just incorrect for the first 'i' or whatever'), this will print out for every failing test all the non-matching data.
-
-* `--failure_stride` - when printing failures, print every n failures only.
-
-* `--data_path` - path to where you have the `Generator*.dat` and `*.json` serialization regression data. Defaults to current directory.
-
-* `--backend` - which backend to use for the computation. Defaults to `st:numpy:cpu:IJK`.
-* `--python_regression` - Run the tests that have Python based regression data. Only applies to running parallel tests (savepoint_tests_mpi)
-Pytest provides a lot of options, which you can see by `pytest --help`. Here are some
-common options for our tests, which you can add to `TEST_ARGS`:
-
-* `-r` - is used to report test types other than failure. It can be provided `s` for skipped (e.g. tests which were not run because earlier tests of the same stencil failed), `x` for xfail or "expected to fail" tests (like tests with no translate class), or `p` for pass. For example, to report skipped and xfail tests you would use `-rsx`.
-
-* `--disable-warnings` - will stop all warnings from being printed at the end of the tests, for example warnings that translate classes are not yet implemented.
-
-* `-v` - will increase test verbosity, while `-q` will decrease it.
-
-* `-s` - will let stdout print directly to console instead of capturing the output and printing it when a test fails only. Note that logger lines will always be printed both during (by setting log_cli in our pytest.ini file) and after tests.
-
-* `-m` - will let you run only certain groups of tests. For example, `-m=parallel` will run only parallel stencils, while `-m=sequential` will run only stencils that operate on one rank at a time.
-
-* `--threshold_overrides_file` - will read a yaml file with error thresholds specified for specific backend and platform (docker or metal) configurations, overriding the max_error thresholds defined in the Translate classes. Format of the yaml file is described [here](tests/savepoint/translate/overrides/README.md).
-
-* `--dperiodic` - run tests on a doubly-periodic domain. Will look for only one tile's worth of test data and parallel tests will be run with a TileCommunicator instead of a CubedSphereCommunicator.
-
-**NOTE:** FV3 is current assumed to be by default in a "development mode", where stencils are checked each time they execute for code changes (which can trigger regeneration). This process is somewhat expensive, so there is an option to put FV3 in a performance mode by telling it that stencils should not automatically be rebuilt:
-
-```shell
-$ export FV3_STENCIL_REBUILD_FLAG=False
-```
-
-## Porting a new stencil
-
-1. Find the location in the fv3gfs-fortran repo code where the save-point is to be added, e.g. using
-
-```shell
-$ git grep <stencil_name> <checkout of fv3gfs-fortran>
-```
-
-2. Create a `translate` class from the serialized save-point data to a call to the stencil or function that calls the relevant stencil(s).
-
-These are usually named `tests/savepoint/translate/translate_<lowercase name>`
-
-Import this class in the `tests/savepoint/translate/__init__.py` file
-
-3. Write a Python function wrapper that the translate function (created above) calls.
-
-By convention, we name these `fv3core/stencils/<lower case stencil name>.py`
-
-4. Run the test, either with one name or a comma-separated list
-
-```shell
-$ make dev_tests TEST_ARGS="-–which_modules=<stencil name(s)>"
-```
-
-**Please also review the [Porting conventions](#porting-conventions) section for additional explanation**
-## Installation
-
-### Docker Image
-
-To build the `us.gcr.io/vcm-ml/fv3core` image with required dependencies for running the Python code, run
-
-```shell
-$ make build
-```
-
-Add `PULL=False` to build from scratch without running `docker pull`:
-
-```shell
-PULL=False make build
-```
-
-## Relevant repositories
-
-- https://github.com/GridTools/serialbox -
-  Serialbox generates serialized data when the Fortran model runs and has bindings to manage data from Python
-
-- https://github.com/VulcanClimateModeling/fv3gfs-fortran -
-  This is the existing Fortran model decorated with serialization statements from which the test data is generated
-
-- https://github.com/GridTools/gt4py -
-  Python package for the DSL language
-
-- https://github.com/VulcanClimateModeling/util
-  Python specific model functionality, such as halo updates.
-
-- https://github.com/VulcanClimateModeling/fv3gfs-wrapper
-  A Python based wrapper for running the Fortran version of the FV3GFS model.
-
-Some of these are submodules.
-While tests can work without these, it may be necessary for development to have these as well.
-To add these to the local repository, run
-
-```shell
-$ git submodule update --init
-```
-
-The submodules include:
-
-- `external/util` - git@github.com:VulcanClimateModeling/util.git
-- `external/daint_venv` -  git@github.com:VulcanClimateModeling/daint_venv.git
-
-## Dockerfiles and building
-
-There are two main docker files:
-
-1. `docker/dependencies.Dockerfile` - defines dependency images such as for mpi, serialbox, and GT4py
-
-2. `docker/Dockerfile` - uses the dependencies to define the final fv3core images.
-
-The dependencies are separated out into their own images to expedite rebuilding the docker image without having to rebuild dependencies, especially on CI.
-
-For the commands below using `make -C docker`, you can alternatively run `make` from within the `docker` directory.
-
-These dependencies can be updated, pushed, and pulled with `make -C docker build_deps`, `make -C docker push_deps`, and `make -C docker pull_deps`. The tag of the dependencies is based on the tag of the current build in the Makefile, which we will expand on below.
-
-Building from scratch requires both a deps and build command, such as `make -C docker pull_deps fv3core_image`.
-
-If any example fails for "pulled dependencies", it means the dependencies have never been built. You can
-build them and push them to GCR with:
-
-```shell
-$ make -C docker build_deps push_deps
-```
-
-### Building examples
-
-fv3core image with pulled dependencies:
-
-```shell
-$ make -C docker pull_deps fv3core_image
-```
-
-CUDA-enabled fv3core image with pulled dependencies:
-```
-$ CUDA=y make -C docker pull_deps fv3core_image
-```
-
-fv3core image with locally-built dependencies:
-```shell
-$ make -C docker build_deps fv3core_image
-```
-
-### Updating Serialbox
-
-If you need to install an updated version of Serialbox, you must first install cmake into the development environment. To install an updated version of Serialbox from within the container run
-
-```shell
-$ wget https://github.com/Kitware/CMake/releases/download/v3.17.3/cmake-3.17.3.tar.gz && \
-  tar xzf cmake-3.17.3.tar.gz && \
-  cd cmake-3.17.3 && \
-  ./bootstrap && make -j4 && make install
-$ git clone -b v2.6.1 --depth 1 https://github.com/GridTools/serialbox.git /tmp/serialbox
-$ cd /tmp/serialbox
-$ cmake -B build -S /tmp/serialbox -DSERIALBOX_USE_NETCDF=ON -DSERIALBOX_TESTING=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/serialbox
-$ cmake --build build/ -j $(nproc) --target install
-$ cd -
-$ rm -rf build /tmp/serialbox
-```
-
-## Pinned dependencies
-
-Dependencies are pinned using `constraints.txt`. This is auto-generated by pip-compile from the `pip-tools` package, which reads `requirements.txt` and `requirements/requirements_lint.txt`, determines the latest versions of all dependencies (including recursive dependencies) compatible those files, and writes pinned versions for all dependencies. This can be updated using:
-
-```shell
-$ make constraints.txt
-```
-
-This file is committed to the repository, and gives more reproducible tests if an old commit of the repository is checked out in the future. The constraints are followed when creating the `fv3core` docker images. To ensure consistency this should ideally be run from inside a docker development environment, but you can also run it on your local system with an appropriate Python 3 environment.
-
-## Development
-
-To develop fv3core, you need to install the linting requirements in `requirements/requirements_lint.txt`. To install the pinned versions, use:
-
-```shell
-$ pip install -c constraints.txt -r requirements/requirements_lint.txt
-```
-
-This adds `pre-commit`, which we use to lint and enforce style on the code. The first time you install `pre-commit`, install its git hooks using:
-
-```shell
-$ pre-commit install
-pre-commit installed at .git/hooks/pre-commit
-```
-
-As a convenience, the `lint` target of the top-level makefile executes `pre-commit run --all-files`.
-Linting, which formats files and checks for some style conventions, is required, as the same checks are the first step in the continuous integration testing that happens when creating a pull request.
-Linting locally saves time and literal energy, since CI tests do not have to be launched so many times!
-
- Please see the 'Development Guidelines' below for more information on the structure of the code to align your new code with the current conventions, as well as the CONTRIBUTING.md document for style guidelines.
-
-## GT4Py version
-
-FV3Core does not actually use the [GridTools/gt4py](https://github.com/gridtools/gt4py) main, it instead uses a Vulcan Climate Modeling development branch.
-This is publically available version at [VCM/gt4py](https://github.com/vulcanclimatemodeling/gt4py).
-
-Situation: There is a new stable feature in a gt4py PR, but it is not yet merged into the GridTools/gt4py main branch.
-[branches.cfg](https://github.com/VulcanClimateModeling/gt4py/blob/develop/branches.cfg) lists these features.
-Steps:
-
-1. Add any new branches to `branches.cfg`
-2. Rebuild the develop branch, either:
-  a. `make_develop gt4py-dev path/to/branches.cfg` (you may have to resolve conflicts...)
-  b. Adding new commits on top of the existing develop branch (e.g. merge or cherry-pick)
-3. Force push to the develop branch: `git push -f upstream develop`
-
-The last step will launch Jenkins tests. If these pass:
-
-1. Create a git tag: `git tag v-$(git rev-parse --short HEAD)`
-2. Push the tag: `git push upstream --tags`
-3. Make a PR to [VCM/gt4py](https://github.com/vulcanclimatemodeling/fv3core) that updates the version in `docker/Makefile` to the new tag.
-
-## License
-FV3Core is provided under the terms of the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html) license.
-
-# Development guidelines
-
-## File structure / conventions
-The main functionality of the FV3 dynamical core, which has been ported from the Fortran version in the fv3gfs-fortran repo, is defined using GT4py stencils and python 'compute' functions in fv3core/stencils. The core is comprised of units of calculations defined for regression testing. These were initially generally separated into distinct files in fv3core/stencils with corresponding files in tests/savepoint/translate/translate_<unit>.py defining the translation of variables from Fortran to Python. Exceptions exist in cases where topical and logical grouping allowed for code reuse. As refactors optimize the model, these units may be merged to occupy the same files and even methods/stencils, but the units should still be tested separately, unless determined to be redundant.
-
-The core has most of its calculations happening in GT4py stencils, but there are still several instances of operations happening in Python directly, which will need to be replaced with GT4py code for optimal performance.
-
-The namelist and grid are global variables defined in fv3core/_config.py The namelist is 'flattened' so that the grouping name of the option is not required to access the data (we may want to change this).
-
-The grid variables are mostly 2d variables and are 'global' to the model thread per mpi rank. The grid object also contains domain and layout information relevant to the current rank being operated on.
-
-Utility functions in `fv3core/utils/` include:
-  - `gt4py_utils.py`:
-    - default gt4py and model settings
-    - methods for generating gt4py storages
-    - methods for using numpy and cupy arrays in python functions that have not been put into GT4py
-    - methods for handling complex patterns that did not immediately map to gt4py, and will mostly be removed with future refactors (e.g. k_split_run)
-    - some general model math computations (e.g. great_circle_dist), that will eventually be put into gt4py with a future refactor
-  - `grid.py`:
-    - A Grid class definition that provides information about the grid layout, current tile informationm access to grid variables used globally, and convenience methods related to tile indexing, origins and domains commonly used
-    - A grid is defined for each MPI rank (minimum 6 ranks, 1 for each tile face of the cubed sphere grid represnting the whole Earth)
-    - Also provides functionality for generating a Quantity object used for halo updates and other utilities
-  - `corners`: port of corner calculations, initially direct Python calculations, being replaced with GT4py gtscript functions as the GT4py regions feature is implemented
-  - `mpi.py`: a wrapper for importing mpi4py when available
-  - `global_constants.py`: constants for use throughout the model
-  - `typing.py`: Clean names for common types we use in the model. This is new and
-    hasn't been adopted throughout the model yet, but will eventually be our
-    standard. A shorthand 'sd' has been used in the intial version.
-
-The `tests/` directory currently includes a framework for translating fields serialized (using
-Serialbox from GridTools) from a Fortran run into gt4py storages that can be inputs to
-fv3core unit computations, and compares the results of the ported code to serialized
-data following a unit computation.
-
-The `docker/` directory provides Dockerfiles for building a repeatable environment in which
-to run the core
-
-The `external/` directory is for submoduled repos that provide essential functionality
-
-The build system uses Makefiles following the convention of other repos within VulcanClimateModeling.
-
-## Model Interface
-
-The top level functions fv_dynamics and fv_sugridz can currenty only be run in parallel using mpi with a minimum of 6 ranks (there are a few other units that also require this, e.g. whenever there is a halo update involved in a unit). These are the interface to the rest of the model and currently have different conventions than the rest of the model.
- - A 'state' object (currently a SimpleNamespace) stores pointers to the allocated data fields
- - Most functions within dyn_core can be run sequentially per rank
- - Currently a list of ArgSpecs must decorate an interface function, where each ArgSpec provides useful information about the argument, e.g.: `@state_inputs( ArgSpec("qvapor", "specific_humidity", "kg/kg", intent="inout")`
-   - The format is (fortran_name, long_name, units, intent)
-   - We currently provide a duplicate of most of the metadata in the specification of the unit test, but that may be removed eventually.
- - Then the function itself, e.g. fv_dynamics, has arguments of 'state', 'comm' (the communicator) and all of the scalar parameters being provided.
-
-### Porting conventions
-
-Generation of regression data occurs in the fv3gfs-fortran repo (https://github.com/VulcanClimateModeling/fv3gfs-fortran) with serialization statements and a build procedure defined in `tests/serialized_test_data_generation`. The version of data this repo currently tests against is defined in `FORTRAN_SERIALIZED_DATA_VERSION` in this repo's `docker/Makefile.image_names`. Fields serialized are defined in Fortran code with serialization comment statements such as:
-
-```
-    !$ser savepoint C_SW-In
-    !$ser data delpcd=delpc delpd=delp ptcd=ptc
-```
-
-where the name being assigned is the name the fv3core uses to identify the variable in the test code. When this name is not equal to the name of the variable, this was usually done to avoid conflicts with other parts of the code where the same name is used to reference a differently sized field.
-
-The majority of the logic for translating from data serialized from Fortran to something that can be used by Python, and the comparison of the results, is encompassed by the main Translate class in the tests/savepoint/translate/translate.py file. Any units not involving a halo update can be run using this framework, while those that need to be run in parallel can look to the ParallelTranslate class as the parent class in tests/savepoint/translate/parallel_translate.py. These parent classes provide generally useful operations for translating serialized data between Fortran and Python specifications, and for applying regression tests.
-
-A new unit test can be defined as a new child class of one of these, with a naming convention of `Translate<Savepoint Name>` where `Savepoint Name` is the name used in the serialization statements in the Fortran code, without the `-In` and `-Out` part of the name. A translate class can usually be minimally specify the input and output fields. Then, in cases where the parent compute function is insuffient to handle the complexity of either the data translation or the compute function, the appropriate methods can be overridden.
-
-For Translate objects
-  - The init function establishes the assumed translation setup for the class, which can be dynamically overridden as needed.
-  - the parent compute function does:
-    - Makes gt4py storages of the max shape (grid.npx+1, grid.npy+1, grid.npz+1) aligning the data based on the start indices specified. (gt4py requires data fields have the same shape, so in this model we have buffer points so all calculations can be done easily without worrying about shape matching).
-    - runs the compute function (defined in self.compute_func) on the input data storages
-    - slices the computed Python fields to be compared to fortran regression data
-  - The unit test then uses a modified relative error metric to determine whether the unit passes
-  - The init method for a Translate class:
-    - The input (self.in_vars["data_vars"]) and output(self.out_vars) variables are specified in dictionaries, where the keys are the name of the variable used in the model and the values are dictionaries specifying metadata for translation of serialized data to gt4py storages. The metadata that can be specied to override defaults are:
-    - Indices to line up data arrays into gt4py storages (which all get created as the max possible size needed by all operations, for simplicity): "istart", "iend", "jstart", "jend", "kstart", "kend". These should be set using the 'grid' object available to the Translate object, using equivalent index names as in the declaration of variables in the Fortran code, e.g. real:: cx(bd%is:bd%ie+1,bd%jsd:bd%jed ) means we should assign. Example:
-
-```python
-      self.in_vars["data_vars"]["cx"] = {"istart": self.is\_, "iend": self.ie + 1,
-                                         "jstart": self.jsd, "jend": self.jed,}
-```
-  - There is only a limited set of Fortran shapes declared, so abstractions defined in the grid can also be used,
-    e.g.: `self.out_vars["cx"] = self.grid.x3d_compute_domain_y_dict()`. Note that the variables, e.g. `grid.is\_` and `grid.ie` specify the 'compute' domain in the x direction of the current tile, equivalent to `bd%is` and `bd%ie` in the Fortran model EXCEPT that the Python variables are local to the current MPI rank (a subset of the tile face), while the Fortran values are global to the tile face. This is because these indices are used to slice into fields, which in Python is 0-based, and in Fortran is based on however the variables are declared. But, for the purposes of aligning data for computations and comparisons, we can match them in this framework. Shapes need to be defined in a dictionary per variable including `"istart"`, `"iend"`, `"jstart"`, `"jend"`, `"kstart"`, `"kend"` that represent the shape of that variable as defined in the Fortran code. The default shape assumed if a variable is specified with an empty dictionary is `isd:ied, jsd:jed, 0:npz - 1` inclusive, and variables that aren't that shape in the Fortran code need to have the 'start' indices specified for the in_vars dictionary , and 'start' and 'end' for the out_vars.
-    - `"serialname"` can be used to specify a name used in the Fortran code declaration if we'd like the model to use a different name
-    - `"kaxis"`: which dimension is the vertical direction. For most variables this is '2' and does not need to be specified. For Fortran variables that assign the vertical dimension to a different axis, this can be set to ensure we end up with 3d storages that have the vertical dimension where it is expected by GT4py.
-    - `"dummy_axes"`: If set this will set of the storage to have singleton dimensions in the axes defined. This is to enable testing stencils where the full 3d data has not been collected and we want to run stencil tests on the data for a particular slice.
-    - `"names_4d"`: If a 4d variable is being serialized, this can be set to specify the names of each 3d field. By default this is the list of tracers.
-    - input variables that are scalars should be added to `self.in_vars["parameters"]`
-    - `self.compute_func` is the name of the model function that should be run by the compute method in the translate class
-    - `self.max_error` overrides the parent classes relative error threshold. This should only be changed when the reasons for non-bit reproducibility are understood.
-    - `self.max_shape` sets the size of the gt4py storage created for testing
-    - `self.ignore_near_zero_errors[<varname>] = True`: This is an option to let some fields pass with higher relative error if the absolute error is very small
-    - `self.skip_test`: This is an option to jump over the test case, to be used in the override file for temporary deactivation of tests.
-
-For `ParallelTranslate` objects:
-  - Inputs and outputs are defined at the class level, and these include metadata such as the "name" (e.g. understandable name for the symbol), dimensions, units and n_halo(numb er of halo lines)
-  - Both `compute_sequential` and `compute_parallel` methods may be defined, where a mock communicator is used in the `compute_sequential` case
-  - The parent assumes a state object for tracking fields and methods exist for translating from inputs to a state object and extracting the output variables from the state. It is assumed that Quantity objects are needed in the model method in order to do halo updates.
-  - `ParallelTranslate2Py` is a slight variation of this used for many of the parallel units that do not yet utilize a state object and relies on the specification of the same index metadata of the Translate classes
-  - `ParallelTranslateBaseSlicing` makes use of the state but relies on the Translate object of self._base, a Translate class object, to align the data before making quantities, computing and comparing.
-
-### Debugging Tests
-
-Pytest can be configured to give you a pdb session when a test fails. To route this properly through docker, you can run:
-
-```bash
-TEST_ARGS="-v -s --pdb" RUN_FLAGS="--rm -it" make tests
-```
-
-This can be done with any pytest target, such as `make savepoint_tests` and `make savepoint_tests_mpi`.
-
-### GEOS API
-
-The `GeosDycoreWrapper` class provides an API to run the dynamical core in a Python component of a GEOS model run. A `GeosDycoreWrapper` object is initialized with a namelist, communicator, and backend, which creates the communicators, partitioners, dycore state, and dycore object required to run the Pace dycore. A wrapper object takes numpy arrays of `u, v, w, delz, pt, delp, q, ps, pe, pk, peln, pkz, phis, q_con, omga, ua, va, uc, vc, mfxd, mfyd, cxd, cyd,` and `diss_estd` and returns a dictionary containing numpy arrays of those same variables. Wrapper objects contain a `timer` attrubite that tracks the amount of time moving input data to the dycore state, running the dynamical core, and retrieving the data from the state.
+# TEMPORARY BRANCH: Up-skilling to GEOS v11.4.2
+
+This branch exists solely for up-skilling pyFV3 to be able run GEOS in it's v11.4.2 FP configuration.
+The need for a seperate branch from `develop` rely in the following differences:
+
+- GEOS run a 32bit floating point precision version (with appropriate 64bit buffers for mass conservation). This means the translate test requires a new set of data _and_ will not pass on old 8.1.3 Pace data.
+- GEOS requires options that are deemed "legacy" and that we may want to replace rather than port
+- Project requirements demand quick iterative development, while `pyFV3` demands concertation between all stakeholders.
+
+The aim is to validate and benchmark GEOS v11.4.2 with this dynamics. Once done, we will aim to move _as much code as possible_ back into develop.
+The methodology goes as follows
+
+- Merge directly into `develop` any changes that do not demand a new set of data
+- Keep track of the feature branch (below) that can't be merged in `develop` for future PR
+- Keep track of GEOS vs SHiELD differences for future discussions
+
+## Feature branches
+
+Legend:
+
+- ⚙️ _GEOS - WIP_ : Ongoing work - can be merged temporarily
+- 🔶 _GEOS - Merged_:  Considered done - merged in GEOS v11.4.2 branch but NOT in `develop`
+- ✅ _Develop - Merged_: Work done as part of up-skilling done for GEOS merged in `develop` AND the GEOS v11.4.2 branch.
+
+Branches:
+
+- ✅ `fix/F32/UpdateDzC`@Florian: Fix for fluxes gradient
+- ✅ `fix/F32/DivergenceDamping`@Florian: Fix for 32-bit scalars in DivergenceDamping
+- ✅ `fix/F32/UpdateDzD`@Florian: Fix for fluxes gradient & python computation
+- ✅ `fix/F32/nh_p_grad` @ Florian: Fix for 32-bit NonHydrostaticPressureGradient
+- 🔶 `fix/RayleighDamping_mixed_precision`@Florian: fix the Ray_Fast test
+- 🔶 `GEOS_update/yppm_xppm`@Florian: fix the YPPM/XPPM with `hord = -6`
+- 🔶 `fix/DelnFlux_f32_support`@Florian: Fix for f32 support for DelnFlux (partial pass)
+- 🔶 `fix/GEOSv11_4_2/HyperDiffusionDamping`@Florian: fix the Hyperdiffusion Damping by restoring factor to be 64-bit float
+- ⚙️ `fix/GEOS/D_SW`@Florian: Fix D_SW heat dissipation, column calculation and new `dpx` accumulation (partial pass)
+- ⚙️ `fix/GEOSv11_4_2/A2B_Ord4`@Florian: Fix for 32-bit A2B_Ord4
+- ⚙️ `fix/GEOSv11_4_2/RiemanSolver`@Florian: Fix for 32-bit RiemanSolver
+- ⚙️ `fix/GEOSv11_4_2/C_SW`@Florian: Fix for C_SW for 32-bit
+- ⚙️ `fix/GEOSv11_4_2/Dyncore`@Florian: Fix for Acoustics and DycoreState for 32-bit and `dpx` calculation
+    - MERGE ORDER: after `fix/GEOS/D_SW`
+    - MERGE ORDER: after `fix/GEOSv11_4_2/HyperDiffusionDamping`
+- ⚙️ `feature/tracer_rework_part1` @Florian: Allow for update of N Tracers
+- ⚙️ `fix/GEOS/TracerAdvection` @Florian: Allow for non-update of mass fluxes and courant number, f32 fixes, correct computation of `cmax` and `nsplit`, overcomputation into the algorithm
+    - BASED ON `tracer_rework_part1`
+    - REQUIRES: `ndsl` with tracer rework
+- ⚙️ `feature/fv_mapz/GEOS` @ Chris K: Remapping for GEOS
+    - REQUIRES: `ndsl` with tracer rework
+- ⚙️ `fix/GEOSv11_4_2/Dynamics`@Florian: Fix for the f32 & GEOS version of dynamics
+    - REQUIRES: `ndsl` with tracer rework
+    - REQUIRES: `tracer_rework_part1`, `fix/GEOSv11_4_2/Dyncore`, `fix/GEOS/TracerAdvection`
+    - MERGE ORDER: after `fix/GEOSv11_4_2/HyperDiffusionDamping`
diff --git a/examples/standalone/runfile/dynamics.py b/examples/standalone/runfile/dynamics.py
index 74cb249f..592e9ca6 100755
--- a/examples/standalone/runfile/dynamics.py
+++ b/examples/standalone/runfile/dynamics.py
@@ -270,6 +270,7 @@ def setup_dycore(
         config=dycore_config,
         phis=state.phis,
         state=state,
+        exclude_tracers=[],
         timestep=timedelta(seconds=dycore_config.dt_atmos),
     )
     return dycore, state, stencil_factory
diff --git a/pyfv3/_config.py b/pyfv3/_config.py
index 0f226979..6647020d 100644
--- a/pyfv3/_config.py
+++ b/pyfv3/_config.py
@@ -8,12 +8,13 @@
 import yaml
 from dacite import Config, from_dict
 
+from ndsl.dsl.typing import Float, Int
 from ndsl.utils import f90nml_as_dict
 
 
-DEFAULT_INT = 0
+DEFAULT_INT = Int(0)
 DEFAULT_STR = ""
-DEFAULT_FLOAT = 0.0
+DEFAULT_FLOAT = Float(0.0)
 DEFAULT_BOOL = False
 DEFAULT_DYCORE_NML_GROUPS = (
     "main_nml",
@@ -29,34 +30,34 @@ class SatAdjustConfig:
     rad_rain: bool
     rad_graupel: bool
     tintqs: bool
-    sat_adj0: float
-    ql_gen: float
-    qs_mlt: float
-    ql0_max: float
-    t_sub: float
-    qi_gen: float
-    qi_lim: float
-    qi0_max: float
-    dw_ocean: float
-    dw_land: float
-    icloud_f: int
-    cld_min: float
-    tau_i2s: float
-    tau_v2l: float
-    tau_r2g: float
-    tau_l2r: float
-    tau_l2v: float
-    tau_imlt: float
-    tau_smlt: float
+    sat_adj0: Float
+    ql_gen: Float
+    qs_mlt: Float
+    ql0_max: Float
+    t_sub: Float
+    qi_gen: Float
+    qi_lim: Float
+    qi0_max: Float
+    dw_ocean: Float
+    dw_land: Float
+    icloud_f: Int
+    cld_min: Float
+    tau_i2s: Float
+    tau_v2l: Float
+    tau_r2g: Float
+    tau_l2r: Float
+    tau_l2v: Float
+    tau_imlt: Float
+    tau_smlt: Float
 
 
 @dataclasses.dataclass(frozen=True)
 class RemappingConfig:
     fill: bool
-    kord_tm: int
-    kord_tr: int
-    kord_wz: int
-    kord_mt: int
+    kord_tm: Int
+    kord_tr: Int
+    kord_wz: Int
+    kord_mt: Int
     do_sat_adj: bool
     sat_adjust: SatAdjustConfig
 
@@ -67,32 +68,32 @@ def hydrostatic(self) -> bool:
 
 @dataclasses.dataclass(frozen=True)
 class RiemannConfig:
-    p_fac: float
-    a_imp: float
+    p_fac: Float
+    a_imp: Float
     use_logp: bool
-    beta: float
+    beta: Float
 
 
 @dataclasses.dataclass(frozen=True)
 class DGridShallowWaterLagrangianDynamicsConfig:
-    dddmp: float
-    d2_bg: float
-    d2_bg_k1: float
-    d2_bg_k2: float
-    d4_bg: float
-    ke_bg: float
-    nord: int
-    n_sponge: int
-    grid_type: int
-    d_ext: float
-    hord_dp: int
-    hord_tm: int
-    hord_mt: int
-    hord_vt: int
+    dddmp: Float
+    d2_bg: Float
+    d2_bg_k1: Float
+    d2_bg_k2: Float
+    d4_bg: Float
+    ke_bg: Float
+    nord: Int
+    n_sponge: Int
+    grid_type: Int
+    d_ext: Float
+    hord_dp: Int
+    hord_tm: Int
+    hord_mt: Int
+    hord_vt: Int
     do_f3d: bool
     do_skeb: bool
-    d_con: float
-    vtdm4: float
+    d_con: Float
+    vtdm4: Float
     inline_q: bool
     convert_ke: bool
     do_vort_damp: bool
@@ -101,12 +102,12 @@ class DGridShallowWaterLagrangianDynamicsConfig:
 
 @dataclasses.dataclass(frozen=True)
 class AcousticDynamicsConfig:
-    tau: float
-    k_split: int
-    n_split: int
-    m_split: int
-    delt_max: float
-    rf_cutoff: float
+    tau: Float
+    k_split: Int
+    n_split: Int
+    m_split: Int
+    delt_max: Float
+    rf_cutoff: Float
     rf_fast: bool
     breed_vortex_inline: bool
     """
@@ -121,6 +122,8 @@ class AcousticDynamicsConfig:
     """
     riemann: RiemannConfig
     d_grid_shallow_water: DGridShallowWaterLagrangianDynamicsConfig
+    dz_min: float
+    """Controls minimum thickness in NH solver"""
 
     @property
     def nord(self) -> int:
@@ -161,50 +164,50 @@ def use_logp(self) -> bool:
 
 @dataclasses.dataclass
 class DynamicalCoreConfig:
-    dt_atmos: int = DEFAULT_INT
-    n_steps: int = 1
-    a_imp: float = DEFAULT_FLOAT
-    beta: float = DEFAULT_FLOAT
-    consv_te: float = DEFAULT_FLOAT
-    d2_bg: float = DEFAULT_FLOAT
-    d2_bg_k1: float = DEFAULT_FLOAT
-    d2_bg_k2: float = DEFAULT_FLOAT
-    d4_bg: float = DEFAULT_FLOAT
-    d_con: float = DEFAULT_FLOAT
-    d_ext: float = DEFAULT_FLOAT
-    dddmp: float = DEFAULT_FLOAT
-    delt_max: float = DEFAULT_FLOAT
+    dt_atmos: Int = DEFAULT_INT
+    n_steps: Int = 1
+    a_imp: Float = DEFAULT_FLOAT
+    beta: Float = DEFAULT_FLOAT
+    consv_te: Float = DEFAULT_FLOAT
+    d2_bg: Float = DEFAULT_FLOAT
+    d2_bg_k1: Float = DEFAULT_FLOAT
+    d2_bg_k2: Float = DEFAULT_FLOAT
+    d4_bg: Float = DEFAULT_FLOAT
+    d_con: Float = DEFAULT_FLOAT
+    d_ext: Float = DEFAULT_FLOAT
+    dddmp: Float = DEFAULT_FLOAT
+    delt_max: Float = DEFAULT_FLOAT
     do_sat_adj: bool = DEFAULT_BOOL
     do_vort_damp: bool = DEFAULT_BOOL
     fill: bool = DEFAULT_BOOL
-    hord_dp: int = DEFAULT_INT
-    hord_mt: int = DEFAULT_INT
-    hord_tm: int = DEFAULT_INT
-    hord_tr: int = DEFAULT_INT
-    hord_vt: int = DEFAULT_INT
+    hord_dp: Int = DEFAULT_INT
+    hord_mt: Int = DEFAULT_INT
+    hord_tm: Int = DEFAULT_INT
+    hord_tr: Int = DEFAULT_INT
+    hord_vt: Int = DEFAULT_INT
     hydrostatic: bool = DEFAULT_BOOL
-    k_split: int = DEFAULT_INT
-    ke_bg: float = DEFAULT_FLOAT
-    kord_mt: int = DEFAULT_INT
-    kord_tm: int = DEFAULT_INT
-    kord_tr: int = DEFAULT_INT
-    kord_wz: int = DEFAULT_INT
-    n_split: int = DEFAULT_INT
-    nord: int = DEFAULT_INT
-    npx: int = DEFAULT_INT
-    npy: int = DEFAULT_INT
-    npz: int = DEFAULT_INT
-    ntiles: int = DEFAULT_INT
-    nwat: int = DEFAULT_INT
-    p_fac: float = DEFAULT_FLOAT
-    rf_cutoff: float = DEFAULT_FLOAT
-    tau: float = DEFAULT_FLOAT
-    vtdm4: float = DEFAULT_FLOAT
+    k_split: Int = DEFAULT_INT
+    ke_bg: Float = DEFAULT_FLOAT
+    kord_mt: Int = DEFAULT_INT
+    kord_tm: Int = DEFAULT_INT
+    kord_tr: Int = DEFAULT_INT
+    kord_wz: Int = DEFAULT_INT
+    n_split: Int = DEFAULT_INT
+    nord: Int = DEFAULT_INT
+    npx: Int = DEFAULT_INT
+    npy: Int = DEFAULT_INT
+    npz: Int = DEFAULT_INT
+    ntiles: Int = DEFAULT_INT
+    nwat: Int = DEFAULT_INT
+    p_fac: Float = DEFAULT_FLOAT
+    rf_cutoff: Float = DEFAULT_FLOAT
+    tau: Float = DEFAULT_FLOAT
+    vtdm4: Float = DEFAULT_FLOAT
     z_tracer: bool = DEFAULT_BOOL
     do_qa: bool = DEFAULT_BOOL
     layout: tuple[int, int] = (1, 1)
-    grid_type: int = 0
-    u_max: float = 350.0
+    grid_type: Int = Int(0)
+    u_max: Float = Float(350.0)
     """max windspeed for dp config"""
     do_f3d: bool = False
     inline_q: bool = False
@@ -214,40 +217,39 @@ class DynamicalCoreConfig:
     moist_phys: bool = True
     check_negative: bool = False
     # gfdl_cloud_microphys.F90
-    tau_r2g: float = 900.0
+    tau_r2g: Float = Float(900.0)
     """rain freezing during fast_sat"""
-    tau_smlt: float = 900.0
+    tau_smlt: Float = Float(900.0)
     """snow melting"""
-    tau_g2r: float = 600.0
+    tau_g2r: Float = Float(600.0)
     """graupel melting to rain"""
-    tau_imlt: float = 600.0
+    tau_imlt: Float = Float(600.0)
     """cloud ice melting"""
-    tau_i2s: float = 1000.0
+    tau_i2s: Float = Float(1000.0)
     """cloud ice to snow auto - conversion"""
-    tau_l2r: float = 900.0
+    tau_l2r: Float = Float(900.0)
     """cloud water to rain auto - conversion"""
-    tau_g2v: float = 1200.0
+    tau_g2v: Float = Float(1200.0)
     """graupel sublimation"""
-    tau_v2g: float = 21600.0
+    tau_v2g: Float = Float(21600.0)
     """graupel deposition -- make it a slow process"""
-    sat_adj0: float = 0.90
+    sat_adj0: Float = Float(0.90)
     """adjustment factor (0: no 1: full) during fast_sat_adj"""
-    ql_gen: float = (
-        1.0e-3  # max new cloud water during remapping step if fast_sat_adj = .t.
-    )
-    ql_mlt: float = 2.0e-3
+    ql_gen: Float = Float(1.0e-3)
+    """max new cloud water during remapping step if fast_sat_adj = .t."""
+    ql_mlt: Float = Float(2.0e-3)
     """max value of cloud water allowed from melted cloud ice"""
-    qs_mlt: float = 1.0e-6
+    qs_mlt: Float = Float(1.0e-6)
     """max cloud water due to snow melt"""
-    ql0_max: float = 2.0e-3
+    ql0_max: Float = Float(2.0e-3)
     """max cloud water value (auto converted to rain)"""
-    t_sub: float = 184.0
+    t_sub: Float = Float(184.0)
     """min temp for sublimation of cloud ice"""
-    qi_gen: float = 1.82e-6
+    qi_gen: Float = Float(1.82e-6)
     """max cloud ice generation during remapping step"""
-    qi_lim: float = 1.0
+    qi_lim: Float = Float(1.0)
     """cloud ice limiter to prevent large ice build up"""
-    qi0_max: float = 1.0e-4
+    qi0_max: Float = Float(1.0e-4)
     """max cloud ice value (by other sources)"""
     rad_snow: bool = True
     """consider snow in cloud fraction calculation"""
@@ -257,33 +259,34 @@ class DynamicalCoreConfig:
     """consider graupel in cloud fraction calculation"""
     tintqs: bool = False
     """use temperature in the saturation mixing in PDF"""
-    dw_ocean: float = 0.10
+    dw_ocean: Float = Float(0.10)
     """base value for ocean"""
-    dw_land: float = 0.15
+    dw_land: Float = Float(0.15)
     """base value for subgrid deviation / variability over land"""
     # cloud scheme 0 - ?
     # 1: old fvgfs gfdl) mp implementation
     # 2: binary cloud scheme (0 / 1)
-    icloud_f: int = 0
-    cld_min: float = 0.05
+    icloud_f: Int = Int(0)
+    cld_min: Float = Float(0.05)
     """!< minimum cloud fraction"""
-    tau_l2v: float = 300.0
+    tau_l2v: Float = Float(300.0)
     """cloud water to water vapor (evaporation)"""
-    tau_v2l: float = 90.0
+    tau_v2l: Float = Float(90.0)
     """water vapor to cloud water (condensation)"""
-    c2l_ord: int = 4
+    c2l_ord: Int = Int(4)
     regional: bool = False
-    m_split: int = 0
+    m_split: Int = Int(0)
     convert_ke: bool = False
     breed_vortex_inline: bool = False
     use_old_omega: bool = True
     rf_fast: bool = False
     adiabatic: bool = False
-    nf_omega: int = 1
-    fv_sg_adj: int = -1
-    n_sponge: int = 1
+    nf_omega: Int = Int(1)
+    fv_sg_adj: Int = Int(-1)
+    n_sponge: Int = Int(1)
     sw_dynamics: bool = False
     """shallow water conditions"""
+    dz_min: Float = Float(2.0)
     namelist_override: str | None = None
     target_nml_groups: tuple[str, ...] | None = DEFAULT_DYCORE_NML_GROUPS
 
@@ -342,6 +345,7 @@ def from_dict(
                 tuple[int, int]: lambda x: tuple(x),
                 tuple[str, ...]: lambda x: tuple(x) if x is not None else None,
             },
+            cast=[Int, Float],
         )
         dycore_config = from_dict(
             data_class=DynamicalCoreConfig, data=data, config=dacite_config
@@ -450,6 +454,7 @@ def acoustic_dynamics(self) -> AcousticDynamicsConfig:
             breed_vortex_inline=self.breed_vortex_inline,
             use_old_omega=self.use_old_omega,
             riemann=self.riemann,
+            dz_min=self.dz_min,
             d_grid_shallow_water=self.d_grid_shallow_water,
         )
 
diff --git a/pyfv3/dycore_state.py b/pyfv3/dycore_state.py
index 45ccd2de..565bfe4a 100644
--- a/pyfv3/dycore_state.py
+++ b/pyfv3/dycore_state.py
@@ -1,11 +1,13 @@
 from collections.abc import Mapping
 from dataclasses import asdict, dataclass, field, fields
+from types import MappingProxyType
 from typing import Any, Self
 
+import numpy.typing as npt
 import xarray as xr
 
 import ndsl.dsl.gt4py_utils as gt_utils
-from ndsl import Backend, GridSizer, Quantity, QuantityFactory
+from ndsl import Backend, Quantity, QuantityFactory
 from ndsl.constants import (
     I_DIM,
     I_INTERFACE_DIM,
@@ -17,6 +19,65 @@
 from ndsl.dsl.typing import Float
 from ndsl.restart._legacy_restart import open_restart
 from ndsl.typing import Communicator
+from pyfv3.tracers import FVTracers, FVTracersAxisName
+
+
+DEFAULT_TRACER_PROPERTIES = {
+    "specific_humidity": {
+        "pyFV3_key": "vapor",
+        "dims": [K_DIM, J_DIM, I_DIM],
+        "restart_name": "sphum",
+        "units": "g/kg",
+    },
+    "cloud_liquid_water_mixing_ratio": {
+        "pyFV3_key": "liquid",
+        "dims": [K_DIM, J_DIM, I_DIM],
+        "restart_name": "liq_wat",
+        "units": "g/kg",
+    },
+    "cloud_ice_mixing_ratio": {
+        "pyFV3_key": "ice",
+        "dims": [K_DIM, J_DIM, I_DIM],
+        "restart_name": "ice_wat",
+        "units": "g/kg",
+    },
+    "rain_mixing_ratio": {
+        "pyFV3_key": "rain",
+        "dims": [K_DIM, J_DIM, I_DIM],
+        "restart_name": "rainwat",
+        "units": "g/kg",
+    },
+    "snow_mixing_ratio": {
+        "pyFV3_key": "snow",
+        "dims": [K_DIM, J_DIM, I_DIM],
+        "restart_name": "snowwat",
+        "units": "g/kg",
+    },
+    "graupel_mixing_ratio": {
+        "pyFV3_key": "graupel",
+        "dims": [K_DIM, J_DIM, I_DIM],
+        "restart_name": "graupel",
+        "units": "g/kg",
+    },
+    "ozone_mixing_ratio": {
+        "pyFV3_key": "o3mr",
+        "dims": [K_DIM, J_DIM, I_DIM],
+        "restart_name": "o3mr",
+        "units": "g/kg",
+    },
+    "turbulent_kinetic_energy": {
+        "pyFV3_key": "sgs_tke",
+        "dims": [K_DIM, J_DIM, I_DIM],
+        "restart_name": "sgs_tke",
+        "units": "g/kg",
+    },
+    "cloud_fraction": {
+        "pyFV3_key": "cloud",
+        "dims": [K_DIM, J_DIM, I_DIM],
+        "restart_name": "cld_amt",
+        "units": "g/kg",
+    },
+}
 
 
 @dataclass()
@@ -149,75 +210,12 @@ class DycoreState:
             "intent": "inout",
         }
     )
-    qvapor: Quantity = field(
-        metadata={
-            "name": "specific_humidity",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-        }
-    )
-    qliquid: Quantity = field(
-        metadata={
-            "name": "cloud_water_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-            "intent": "inout",
-        }
-    )
-    qice: Quantity = field(
-        metadata={
-            "name": "cloud_ice_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-            "intent": "inout",
-        }
-    )
-    qrain: Quantity = field(
-        metadata={
-            "name": "rain_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-            "intent": "inout",
-        }
-    )
-    qsnow: Quantity = field(
-        metadata={
-            "name": "snow_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-            "intent": "inout",
-        }
-    )
-    qgraupel: Quantity = field(
-        metadata={
-            "name": "graupel_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-            "intent": "inout",
-        }
-    )
-    qo3mr: Quantity = field(
+    tracers: FVTracers = field(
         metadata={
-            "name": "ozone_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-            "intent": "inout",
-        }
-    )
-    qsgs_tke: Quantity = field(
-        metadata={
-            "name": "turbulent_kinetic_energy",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "m**2/s**2",
-            "intent": "inout",
-        }
-    )
-    qcld: Quantity = field(
-        metadata={
-            "name": "cloud_fraction",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "",
+            "name": "tracers",
+            "units": "g/kg",
             "intent": "inout",
+            "dims": [I_DIM, J_DIM, K_DIM, FVTracersAxisName],
         }
     )
     q_con: Quantity = field(
@@ -298,6 +296,8 @@ class DycoreState:
 
     def __post_init__(self) -> None:
         for _field in fields(self):
+            if _field.name == "tracers":
+                continue
             for check_name in ["units", "dims"]:
                 if check_name in _field.metadata:
                     required = _field.metadata[check_name]
@@ -311,24 +311,39 @@ def __post_init__(self) -> None:
                         )
 
     @classmethod
-    def init_zeros(cls, quantity_factory: QuantityFactory) -> Self:
+    def init_zeros(
+        cls,
+        quantity_factory: QuantityFactory,
+        dtype_dict: dict[str, type] | None = None,
+        allow_mismatch_float_precision: bool = True,
+    ) -> Self:
+        """Initialize the dynamics memory state to zero. Default to allow for
+        mixed precision as 32-bit dynamics requires it."""
+
         initial_storages = {}
         for _field in fields(cls):
             if "dims" in _field.metadata.keys():
                 initial_storages[_field.name] = quantity_factory.zeros(
                     _field.metadata["dims"],
                     _field.metadata["units"],
-                    dtype=Float,
-                ).data
+                    dtype=(
+                        dtype_dict[_field.name]
+                        if dtype_dict and _field.name in dtype_dict.keys()
+                        else Float
+                    ),
+                    allow_mismatch_float_precision=allow_mismatch_float_precision,
+                )[:]
         return cls.init_from_storages(
             storages=initial_storages,
-            sizer=quantity_factory.sizer,
-            backend=quantity_factory.backend,
+            quantity_factory=quantity_factory,
+            allow_mismatch_float_precision=allow_mismatch_float_precision,
         )
 
     @classmethod
     def init_from_numpy_arrays(
-        cls, dict_of_numpy_arrays: dict, sizer: GridSizer, backend: Backend
+        cls,
+        dict_of_numpy_arrays: dict,
+        quantity_factory: QuantityFactory,
     ) -> Self:
         field_names = [_field.name for _field in fields(cls)]
         for variable_name in dict_of_numpy_arrays.keys():
@@ -338,42 +353,45 @@ def init_from_numpy_arrays(
                 )
         dict_state = {}
         for _field in fields(cls):
-            if "dims" in _field.metadata.keys():
-                dims = _field.metadata["dims"]
-                dict_state[_field.name] = Quantity(
-                    dict_of_numpy_arrays[_field.name],
-                    dims,
-                    _field.metadata["units"],
-                    origin=sizer.get_origin(dims),
-                    extent=sizer.get_extent(dims),
-                    backend=backend,
-                )
-        return cls(**dict_state)  # type: ignore[arg-type,unused-ignore]
+            dims = _field.metadata["dims"]
+            dict_state[_field.name] = Quantity(
+                dict_of_numpy_arrays[_field.name],
+                dims,
+                _field.metadata["units"],
+                origin=quantity_factory.sizer.get_origin(dims),
+                extent=quantity_factory.sizer.get_extent(dims),
+                backend=quantity_factory.backend,
+            )
+        state = cls(**dict_state)
+        return state
 
     @classmethod
     def init_from_storages(
         cls,
         storages: Mapping[str, Any],
-        sizer: GridSizer,
+        quantity_factory: QuantityFactory,
         bdt: float = 0.0,
         mdt: float = 0.0,
-        backend: Backend | None = None,
+        allow_mismatch_float_precision: bool = False,
     ) -> Self:
-        if not backend:
-            backend = Backend.python()
         inputs = {}
         for _field in fields(cls):
-            if "dims" in _field.metadata.keys():
+            if "dims" in _field.metadata:
                 dims = _field.metadata["dims"]
+                storage = storages[_field.name]
+                if isinstance(storage, Quantity):
+                    storage = storage[:]
                 quantity = Quantity(
-                    storages[_field.name],
+                    storage,
                     dims,
                     _field.metadata["units"],
-                    origin=sizer.get_origin(dims),
-                    extent=sizer.get_extent(dims),
-                    backend=backend,
+                    origin=quantity_factory.sizer.get_origin(dims),
+                    extent=quantity_factory.sizer.get_extent(dims),
+                    backend=quantity_factory.backend,
+                    allow_mismatch_float_precision=allow_mismatch_float_precision,
                 )
                 inputs[_field.name] = quantity
+
         return cls(**inputs, bdt=bdt, mdt=mdt)
 
     @classmethod
@@ -383,14 +401,16 @@ def from_fortran_restart(
         quantity_factory: QuantityFactory,
         communicator: Communicator,
         path: str,
+        backend: Backend,
     ) -> Self:
         state_dict: Mapping[str, Quantity] = open_restart(
             dirname=path,
             communicator=communicator,
-            tracer_properties=TRACER_PROPERTIES,
+            tracer_properties=DEFAULT_TRACER_PROPERTIES,
+        )
+        new = cls.init_zeros(
+            quantity_factory=quantity_factory,
         )
-
-        new = cls.init_zeros(quantity_factory=quantity_factory)
         new.pt.view[:] = new.pt.np.asarray(
             state_dict["air_temperature"].transpose(new.pt.dims).view[:]
         )
@@ -411,31 +431,33 @@ def from_fortran_restart(
         new.v.view[:] = new.v.np.asarray(
             state_dict["y_wind"].transpose(new.v.dims).view[:]
         )
-        new.qvapor.view[:] = new.qvapor.np.asarray(
-            state_dict["specific_humidity"].transpose(new.qvapor.dims).view[:]
+        new.tracers.vapor.view[:] = new.tracers.vapor.np.asarray(
+            state_dict["specific_humidity"].transpose(new.tracers.vapor.dims).view[:]
         )
-        new.qliquid.view[:] = new.qliquid.np.asarray(
+        new.tracers.liquid.view[:] = new.tracers.liquid.np.asarray(
             state_dict["cloud_liquid_water_mixing_ratio"]
-            .transpose(new.qliquid.dims)
+            .transpose(new.tracers.liquid.dims)
             .view[:]
         )
-        new.qice.view[:] = new.qice.np.asarray(
-            state_dict["cloud_ice_mixing_ratio"].transpose(new.qice.dims).view[:]
+        new.tracers.ice.view[:] = new.tracers.ice.np.asarray(
+            state_dict["cloud_ice_mixing_ratio"].transpose(new.tracers.ice.dims).view[:]
         )
-        new.qrain.view[:] = new.qrain.np.asarray(
-            state_dict["rain_mixing_ratio"].transpose(new.qrain.dims).view[:]
+        new.tracers.rain.view[:] = new.tracers.rain.np.asarray(
+            state_dict["rain_mixing_ratio"].transpose(new.tracers.rain.dims).view[:]
         )
-        new.qsnow.view[:] = new.qsnow.np.asarray(
-            state_dict["snow_mixing_ratio"].transpose(new.qsnow.dims).view[:]
+        new.tracers.snow.view[:] = new.tracers.snow.np.asarray(
+            state_dict["snow_mixing_ratio"].transpose(new.tracers.snow.dims).view[:]
         )
-        new.qgraupel.view[:] = new.qgraupel.np.asarray(
-            state_dict["graupel_mixing_ratio"].transpose(new.qgraupel.dims).view[:]
+        new.tracers.graupel.view[:] = new.tracers.graupel.np.asarray(
+            state_dict["graupel_mixing_ratio"]
+            .transpose(new.tracers.graupel.dims)
+            .view[:]
         )
-        new.qo3mr.view[:] = new.qo3mr.np.asarray(
-            state_dict["ozone_mixing_ratio"].transpose(new.qo3mr.dims).view[:]
+        new.tracers.o3mr.view[:] = new.tracers.o3mr.np.asarray(
+            state_dict["ozone_mixing_ratio"].transpose(new.tracers.o3mr.dims).view[:]
         )
-        new.qcld.view[:] = new.qcld.np.asarray(
-            state_dict["cloud_fraction"].transpose(new.qcld.dims).view[:]
+        new.tracers.cloud.view[:] = new.tracers.cld.np.asarray(
+            state_dict["cloud_fraction"].transpose(new.tracers.cld.dims).view[:]
         )
         new.delz.view[:] = new.delz.np.asarray(
             state_dict["vertical_thickness_of_atmospheric_layer"]
@@ -445,21 +467,30 @@ def from_fortran_restart(
 
         return new
 
+    def _xr_dataarray_from_array(
+        self, name: str, metadata: MappingProxyType[Any, Any], data: npt.ArrayLike
+    ) -> xr.DataArray:
+        dims = [f"{dim_name}_{name}" for dim_name in metadata["dims"]]
+        return xr.DataArray(
+            gt_utils.asarray(data),
+            dims=dims,
+            attrs={
+                "long_name": metadata["name"],
+                "units": metadata.get("units", "unknown"),
+            },
+        )
+
     @property
     def xr_dataset(self) -> xr.Dataset:
         data_vars = {}
         for name, field_info in self.__dataclass_fields__.items():
-            if issubclass(field_info.type, Quantity):  # type: ignore[arg-type]
-                dims = [
-                    f"{dim_name}_{name}" for dim_name in field_info.metadata["dims"]
-                ]
-                data_vars[name] = xr.DataArray(
-                    gt_utils.asarray(getattr(self, name).data),
-                    dims=dims,
-                    attrs={
-                        "long_name": field_info.metadata["name"],
-                        "units": field_info.metadata.get("units", "unknown"),
-                    },
+            if isinstance(field_info.type, type) and issubclass(
+                field_info.type, Quantity
+            ):
+                data_vars[name] = self._xr_dataarray_from_array(
+                    name=name,
+                    metadata=field_info.metadata,
+                    data=getattr(self, name).data,
                 )
         return xr.Dataset(data_vars=data_vars)
 
@@ -469,54 +500,5 @@ def __getitem__(self, item: str) -> Any:
     def as_dict(self, quantity_only: bool = True) -> dict[str, Quantity | int]:
         if quantity_only:
             return {k: v for k, v in asdict(self).items() if isinstance(v, Quantity)}
-
-        return {k: v for k, v in asdict(self).items()}
-
-
-TRACER_PROPERTIES = {
-    "specific_humidity": {
-        "dims": [K_DIM, J_DIM, I_DIM],
-        "restart_name": "sphum",
-        "units": "g/kg",
-    },
-    "cloud_liquid_water_mixing_ratio": {
-        "dims": [K_DIM, J_DIM, I_DIM],
-        "restart_name": "liq_wat",
-        "units": "g/kg",
-    },
-    "cloud_ice_mixing_ratio": {
-        "dims": [K_DIM, J_DIM, I_DIM],
-        "restart_name": "ice_wat",
-        "units": "g/kg",
-    },
-    "rain_mixing_ratio": {
-        "dims": [K_DIM, J_DIM, I_DIM],
-        "restart_name": "rainwat",
-        "units": "g/kg",
-    },
-    "snow_mixing_ratio": {
-        "dims": [K_DIM, J_DIM, I_DIM],
-        "restart_name": "snowwat",
-        "units": "g/kg",
-    },
-    "graupel_mixing_ratio": {
-        "dims": [K_DIM, J_DIM, I_DIM],
-        "restart_name": "graupel",
-        "units": "g/kg",
-    },
-    "ozone_mixing_ratio": {
-        "dims": [K_DIM, J_DIM, I_DIM],
-        "restart_name": "o3mr",
-        "units": "g/kg",
-    },
-    "turbulent_kinetic_energy": {
-        "dims": [K_DIM, J_DIM, I_DIM],
-        "restart_name": "sgs_tke",
-        "units": "g/kg",
-    },
-    "cloud_fraction": {
-        "dims": [K_DIM, J_DIM, I_DIM],
-        "restart_name": "cld_amt",
-        "units": "g/kg",
-    },
-}
+        else:
+            return {k: v for k, v in asdict(self).items()}
diff --git a/pyfv3/initialization/test_cases/initialize_aquaplanet.py b/pyfv3/initialization/test_cases/initialize_aquaplanet.py
index 758a7e6f..7e699908 100644
--- a/pyfv3/initialization/test_cases/initialize_aquaplanet.py
+++ b/pyfv3/initialization/test_cases/initialize_aquaplanet.py
@@ -125,8 +125,7 @@ def init_aquaplanet_state(
 
     state = DycoreState.init_from_numpy_arrays(
         numpy_state.__dict__,
-        sizer=quantity_factory.sizer,
-        backend=sample_quantity.metadata.backend,
+        quantity_factory,
     )
 
     comm.halo_update(state.phis, n_points=NHALO)
diff --git a/pyfv3/initialization/test_cases/initialize_baroclinic.py b/pyfv3/initialization/test_cases/initialize_baroclinic.py
index 739430ea..2bcd5510 100644
--- a/pyfv3/initialization/test_cases/initialize_baroclinic.py
+++ b/pyfv3/initialization/test_cases/initialize_baroclinic.py
@@ -8,6 +8,7 @@
 from ndsl.grid.gnomonic import great_circle_distance_lon_lat, lon_lat_midpoint
 from pyfv3.dycore_state import DycoreState
 from pyfv3.initialization import init_utils
+from pyfv3.tracers import FVTracers, setup_fvtracers
 
 
 # maximum windspeed amplitude - close to windspeed of zonal-mean time-mean
@@ -366,11 +367,23 @@ def init_baroclinic_state(
         moist_phys=moist_phys,
         make_nh=(not hydrostatic),
     )
+    tracers = {
+        "vapor": 0,
+        "liquid": 1,
+        "rain": 2,
+        "snow": 3,
+        "ice": 4,
+        "graupel": 5,
+        "cloud": 6,
+    }
+    setup_fvtracers(quantity_factory, len(tracers), tracers)
     state = DycoreState.init_from_numpy_arrays(
         numpy_state.__dict__,
-        sizer=quantity_factory.sizer,
-        backend=sample_quantity.metadata.backend,
+        quantity_factory,
     )
+    state.tracers[:, :, :, FVTracers.index("vapor")].field[:] = numpy_state.qvapor[
+        slice_3d
+    ]
 
     comm.halo_update(state.phis, n_points=NHALO)
 
diff --git a/pyfv3/initialization/test_cases/initialize_rossby.py b/pyfv3/initialization/test_cases/initialize_rossby.py
index 36a57f1d..77b32e38 100644
--- a/pyfv3/initialization/test_cases/initialize_rossby.py
+++ b/pyfv3/initialization/test_cases/initialize_rossby.py
@@ -13,6 +13,7 @@
 from ndsl.grid import GridData
 from pyfv3.dycore_state import DycoreState
 from pyfv3.initialization import init_utils
+from pyfv3.tracers import default_GEOS_tracers
 
 
 NHALO = constants.N_HALO_DEFAULT
@@ -200,10 +201,10 @@ def init_rossby_state(
     _init_for_rossby(numpy_state, grid_data, shape)
     _postinit_for_all_sw(numpy_state)
 
+    default_GEOS_tracers(quantity_factory)
     state = DycoreState.init_from_numpy_arrays(
         numpy_state.__dict__,
-        sizer=quantity_factory.sizer,
-        backend=sample_quantity.metadata.backend,
+        quantity_factory,
     )
 
     comm.halo_update(state.phis, n_points=NHALO)
diff --git a/pyfv3/initialization/test_cases/initialize_tc.py b/pyfv3/initialization/test_cases/initialize_tc.py
index b778534c..1161cb6d 100644
--- a/pyfv3/initialization/test_cases/initialize_tc.py
+++ b/pyfv3/initialization/test_cases/initialize_tc.py
@@ -6,6 +6,7 @@
 from ndsl.grid.gnomonic import great_circle_distance_lon_lat
 from pyfv3.dycore_state import DycoreState
 from pyfv3.initialization import init_utils
+from pyfv3.tracers import FVTracers, setup_fvtracers
 
 
 def _calculate_distance_from_tc_center(pe_v, ps_v, muv, calc, tc_properties):
@@ -561,16 +562,25 @@ def init_tc_state(
     numpy_state.pkz[:] = pkz
     numpy_state.ps[:] = pe[:, :, -1]
     numpy_state.pt[:] = pt
-    numpy_state.qvapor[:] = qvapor
     numpy_state.u[:] = ud
     numpy_state.ua[:] = ua
     numpy_state.v[:] = vd
     numpy_state.va[:] = va
     numpy_state.w[:] = w
+    tracers = {
+        "vapor": 0,
+        "liquid": 1,
+        "rain": 2,
+        "snow": 3,
+        "ice": 4,
+        "graupel": 5,
+        "cloud": 6,
+    }
+    setup_fvtracers(quantity_factory, len(tracers), tracers)
     state = DycoreState.init_from_numpy_arrays(
         numpy_state.__dict__,
-        sizer=quantity_factory.sizer,
-        backend=sample_quantity.metadata.backend,
+        quantity_factory,
     )
+    state.tracers[:, :, :, FVTracers.index("vapor")].field[:] = qvapor
 
     return state
diff --git a/pyfv3/mpi/mpp_sum.py b/pyfv3/mpi/mpp_sum.py
new file mode 100644
index 00000000..746ff72d
--- /dev/null
+++ b/pyfv3/mpi/mpp_sum.py
@@ -0,0 +1,148 @@
+import warnings
+
+import numpy as np
+
+from ndsl import Quantity, StencilFactory
+from ndsl.comm.communicator import Communicator, ReductionOperator
+from ndsl.dsl.typing import Float
+
+
+def _increment_ints_faster(
+    int_sum: np.ndarray,
+    pr: list[float],
+    I_pr: list[float],
+    r: float,
+    max_mag_term: float,
+) -> None:
+    if (r >= 1e30) == r < 1e30:
+        print("NaN_error")
+        return
+    sgn = 1
+    if r < 0.0:
+        sgn = -1
+
+    rs = abs(r)
+    if rs > abs(max_mag_term):
+        max_mag_term = r
+
+    for i in range(len(I_pr)):
+        ival = int(rs * I_pr[i])
+        rs = rs - ival * pr[i]
+        int_sum[i] = int_sum[i] + sgn * ival
+
+
+def _carry_overflow(
+    int_sum: np.ndarray,
+    prec: int,
+    I_prec: float,
+    prec_error: float,
+) -> bool:
+    overflow_error = False
+    for i in range(len(int_sum) - 1, 0, -1):
+        if abs(int_sum[i]) > prec:
+            num_carry = int(int_sum[i] * I_prec)
+            int_sum[i] = int_sum[i] - num_carry * prec
+            int_sum[i - 1] = int_sum[i - 1] + num_carry
+    if abs(int_sum[0]) > prec_error:
+        overflow_error = True
+    return overflow_error
+
+
+def _regularize_ints(int_sum: np.ndarray, prec: int, I_prec: float) -> None:
+    for i in range(len(int_sum) - 1, 0, -1):
+        if abs(int_sum[i]) > prec:
+            num_carry = int(int_sum[i] * I_prec)
+            int_sum[i] = int_sum[i] - num_carry * prec
+            int_sum[i - 1] = int_sum[i - 1] + num_carry
+
+    positive = True
+
+    for i in range(len(int_sum)):
+        if abs(int_sum[i]) > 0:
+            if int_sum[i] < 0:
+                positive = False
+                break
+
+    if positive:
+        for i in range(len(int_sum) - 1, 0, -1):
+            if int_sum[i] < 0:
+                int_sum[i] = int_sum[i] + prec
+                int_sum[i - 1] = int_sum[i - 1] - 1
+
+    else:
+        for i in range(len(int_sum) - 1, 0, -1):
+            if int_sum[i] > 0:
+                int_sum[i] = int_sum[i] - prec
+                int_sum[i - 1] = int_sum[i - 1] + 1
+
+
+def _ints_to_real(ints: np.ndarray, pr: list[float]) -> float:
+    r = 0.0
+
+    for i in range(len(ints)):
+        r = r + pr[i] * ints[i]
+
+    return r
+
+
+class MPPGlobalSum:
+    def __init__(
+        self, stencil_factory: StencilFactory, communicator: Communicator
+    ) -> None:
+        NUMINT = 6
+        self._comm = communicator
+        self._ints_sum = Quantity(
+            data=np.zeros((NUMINT), dtype=Float),
+            dims=["K"],
+            units="dunno",
+            backend=stencil_factory.backend,
+        )
+
+        self._ints_sum_reduce = Quantity(
+            data=np.zeros((NUMINT), dtype=Float),
+            dims=["K"],
+            units="dunno",
+            backend=stencil_factory.backend,
+        )
+
+    def __call__(self, qty_to_sum: Quantity) -> Float:
+        NUMBIT = 46
+        r_prec = 2.0**NUMBIT
+        prec = 2**NUMBIT
+        I_prec = 1.0 / (2.0**NUMBIT)
+        pr = [
+            r_prec**2,
+            r_prec,
+            1.0,
+            1.0 / r_prec,
+            1.0 / r_prec**2,
+            1.0 / r_prec**3,
+        ]
+        I_pr = [1.0 / r_prec**2, 1.0 / r_prec, 1.0, r_prec, r_prec**2, r_prec**3]
+        prec_error = (2**62 + (2**62 - 1)) / 6
+        mag_max_term = 0.0
+
+        # Note: This loop range in i and j are for the TBC test case.
+        self._ints_sum[:] = 0
+        for j in range(qty_to_sum.field.shape[1]):
+            for i in range(qty_to_sum.field.shape[0]):
+                _increment_ints_faster(
+                    self._ints_sum.data[:],
+                    pr,
+                    I_pr,
+                    qty_to_sum.field[i, j],
+                    mag_max_term,
+                )
+
+        if not _carry_overflow(self._ints_sum.data, prec, I_prec, prec_error):
+            warnings.warn("Overflow in MPP sum", category=UserWarning, stacklevel=2)
+
+        self._comm.all_reduce(
+            self._ints_sum,
+            ReductionOperator.SUM,
+            self._ints_sum_reduce,
+        )
+
+        _regularize_ints(self._ints_sum_reduce.data, prec, I_prec)
+
+        return _ints_to_real(self._ints_sum_reduce.data, pr)
diff --git a/pyfv3/mpi/sum.py b/pyfv3/mpi/sum.py
new file mode 100644
index 00000000..b0c055bd
--- /dev/null
+++ b/pyfv3/mpi/sum.py
@@ -0,0 +1,40 @@
+import numpy as np
+
+from ndsl import Quantity, QuantityFactory
+from ndsl.comm.communicator import Communicator, ReductionOperator
+from ndsl.constants import I_DIM, J_DIM
+from ndsl.dsl.dace.orchestration import dace_inhibitor
+from ndsl.dsl.stencil import GridIndexing
+from ndsl.dsl.typing import Float
+from ndsl.optional_imports import cupy as cp
+
+
+class GlobalSum:
+    def __init__(
+        self,
+        quantity_factory: QuantityFactory,
+        communicator: Communicator,
+        grid_indexing: GridIndexing = None,
+    ) -> None:
+        self._comm = communicator
+        # self._tmp_reduce = quantity_factory.empty(dims=[I_DIM, J_DIM], units="n/a")
+        self._tmp_reduce = quantity_factory.zeros(dims=[I_DIM, J_DIM], units="n/a")
+        self._isc = grid_indexing.isc
+        self._iec = grid_indexing.iec
+        self._jsc = grid_indexing.jsc
+        self._jec = grid_indexing.jec
+
+    @dace_inhibitor
+    def __call__(self, qty_to_sum: Quantity) -> Float:
+        assert len(qty_to_sum.field.shape) == 2  # Code handle only 2D quantity
+        self._comm.all_reduce(qty_to_sum, ReductionOperator.SUM, self._tmp_reduce)
+        if isinstance(self._tmp_reduce[:], np.ndarray):
+            return np.sum(
+                self._tmp_reduce[self._isc : self._iec + 1, self._jsc : self._jec + 1]
+            )
+        elif isinstance(self._tmp_reduce[:], cp.ndarray) and cp is not None:
+            return cp.sum(
+                self._tmp_reduce[self._isc : self._iec + 1, self._jsc : self._jec + 1]
+            )
+        else:
+            raise TypeError("Unsupported array type for reduction result.")
diff --git a/pyfv3/stencils/a2b_ord4.py b/pyfv3/stencils/a2b_ord4.py
index a9551046..948fb951 100644
--- a/pyfv3/stencils/a2b_ord4.py
+++ b/pyfv3/stencils/a2b_ord4.py
@@ -1,31 +1,45 @@
-from ndsl import GridIndexing, QuantityFactory, StencilFactory, orchestrate
+from ndsl import GridIndexing, NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, I_INTERFACE_DIM, J_DIM, J_INTERFACE_DIM, K_DIM
 from ndsl.dsl.gt4py import PARALLEL, asin, computation, cos
 from ndsl.dsl.gt4py import function as gtfunction
 from ndsl.dsl.gt4py import horizontal, interval, region, sin, sqrt
-from ndsl.dsl.typing import Float, FloatField, FloatFieldI, FloatFieldIJ
+from ndsl.dsl.typing import (
+    Float,
+    FloatField,
+    FloatFieldI64,
+    FloatFieldIJ,
+    FloatFieldIJ64,
+)
 from ndsl.grid import GridData
 from ndsl.stencils.basic_operations import copy
 
 
-# compact 4-pt cubic interpolation
-c1 = 2.0 / 3.0
-c2 = -1.0 / 6.0
-d1 = 0.375
-d2 = -1.0 / 24.0
+# comact 4-pt cubic interpolation
+c1 = Float(2.0) / Float(3.0)
+c2 = Float(-1.0) / Float(6.0)
+d1 = Float(0.375)
+d2 = Float(-1.0) / Float(24.0)
 # PPM volume mean form
-b1 = 7.0 / 12.0
-b2 = -1.0 / 12.0
+b1 = Float(7.0) / Float(12.0)  # 0.58333333
+b2 = Float(-1.0) / Float(12.0)
 # 4-pt Lagrange interpolation
-a1 = 9.0 / 16.0
-a2 = -1.0 / 16.0
+a1 = Float(0.5625)  # 9/16
+a2 = Float(-0.0625)  # -1/16
+
+r3 = Float(1.0 / 3.0)
 
 
 @gtfunction
 def great_circle_dist(p1a, p1b, p2a, p2b):
-    tb = sin((p1b - p2b) / 2.0) ** 2.0
-    ta = sin((p1a - p2a) / 2.0) ** 2.0
-    return asin(sqrt(tb + cos(p1b) * cos(p2b) * ta)) * 2.0
+    return (
+        asin(
+            sqrt(
+                sin((p1b - p2b) / 2.0) ** 2.0
+                + cos(p1b) * cos(p2b) * sin((p1a - p2a) / 2.0) ** 2.0
+            )
+        )
+        * 2.0
+    )
 
 
 @gtfunction
@@ -95,7 +109,7 @@ def _sw_corner(
             qin[1, -2, 0],
         )
 
-        qout = (ec1 + ec2 + ec3) * (1.0 / 3.0)
+        qout = (ec1 + ec2 + ec3) * r3
         tmp_qout_edges = qout
 
 
@@ -149,7 +163,7 @@ def _nw_corner(
             qin[0, 0, 0],
             qin[1, 1, 0],
         )
-        qout = (ec1 + ec2 + ec3) * (1.0 / 3.0)
+        qout = (ec1 + ec2 + ec3) * r3
         tmp_qout_edges = qout
 
 
@@ -203,7 +217,7 @@ def _ne_corner(
             qin[-1, 0, 0],
             qin[-2, 1, 0],
         )
-        qout = (ec1 + ec2 + ec3) * (1.0 / 3.0)
+        qout = (ec1 + ec2 + ec3) * r3
         tmp_qout_edges = qout
 
 
@@ -257,7 +271,7 @@ def _se_corner(
             qin[0, 0, 0],
             qin[1, 1, 0],
         )
-        qout = (ec1 + ec2 + ec3) * (1.0 / 3.0)
+        qout = (ec1 + ec2 + ec3) * r3
         tmp_qout_edges = qout
 
 
@@ -274,7 +288,7 @@ def lagrange_x_func(qy):
 def qout_x_edge(
     qin: FloatField,
     dxa: FloatFieldIJ,
-    edge_w: FloatFieldIJ,
+    edge_w: FloatFieldIJ64,
     qout: FloatField,
     tmp_qout_edges: FloatField,
 ):
@@ -295,7 +309,7 @@ def qout_x_edge(
 def qout_y_edge(
     qin: FloatField,
     dya: FloatFieldIJ,
-    edge_s: FloatFieldI,
+    edge_s: FloatFieldI64,
     qout: FloatField,
     tmp_qout_edges: FloatField,
 ):
@@ -500,11 +514,11 @@ def doubly_periodic_a2b_ord4(qin):
     Grid conversion is much simpler on a doubly-periodic, orthogonal grid so we
     can bypass most of the above code
     """
-    qx = b1 * (qin[-1, 0, 0] + qin) + b2 * (qin[-2, 0, 0] + qin[1, 0, 0])
-    qy = b1 * (qin[0, -1, 0] + qin) + b2 * (qin[0, -2, 0] + qin[0, 1, 0])
+    qx = b2 * (qin[-2, 0, 0] + qin[1, 0, 0]) * b1 * (qin[-1, 0, 0] + qin)
+    qy = b2 * (qin[0, -2, 0] + qin[0, 1, 0]) * b1 * (qin[0, -1, 0] + qin)
     qout = 0.5 * (
-        a1 * (qx[0, -1, 0] + qx + qy[-1, 0, 0] + qy)
-        + a2 * (qx[0, -2, 0] + qx[0, 1, 0] + qy[-2, 0, 0] + qy[1, 0, 0])
+        a2 * (qx[0, -2, 0] + qx[0, 1, 0] + qy[-2, 0, 0] + qy[1, 0, 0])
+        + a1 * (qx[0, -1, 0] + qx + qy[-1, 0, 0] + qy)
     )
     return qout
 
@@ -514,7 +528,7 @@ def doubly_periodic_a2b_ord4_stencil(qout: FloatField, qin: FloatField):
         qout = doubly_periodic_a2b_ord4(qin)
 
 
-class AGrid2BGridFourthOrder:
+class AGrid2BGridFourthOrder(NDSLRuntime):
     """
     Fortran name is a2b_ord4, test module is A2B_Ord4
     """
@@ -535,7 +549,8 @@ def __init__(
             z_dim: defines whether vertical dimension is centered or staggered
             replace: boolean, update qin to the B grid as well
         """
-        orchestrate(obj=self, config=stencil_factory.config.dace_config)
+        super().__init__(stencil_factory)
+
         if grid_type != 0 and grid_type != 4:
             raise RuntimeError(
                 "A-Grid to B-Grid 4th order (a2b_ord4):"
@@ -561,22 +576,16 @@ def __init__(
             self._edge_s = grid_data.edge_s
             self._edge_n = grid_data.edge_n
 
-            self._tmp_qx = quantity_factory.zeros(
-                dims=[I_INTERFACE_DIM, J_DIM, z_dim],
-                units="unknown",
-                dtype=Float,
+            self._tmp_qx = self.make_local(
+                quantity_factory, [I_INTERFACE_DIM, J_DIM, z_dim]
             )
-            self._tmp_qy = quantity_factory.zeros(
-                dims=[I_DIM, J_INTERFACE_DIM, z_dim],
-                units="unknown",
-                dtype=Float,
+            self._tmp_qy = self.make_local(
+                quantity_factory, [I_DIM, J_INTERFACE_DIM, z_dim]
             )
             # TODO: the dimensions of tmp_qout_edges may not be correct, verify
             # with Lucas and either update the code or remove this comment
-            self._tmp_qout_edges = quantity_factory.zeros(
-                dims=[I_DIM, J_DIM, z_dim],
-                units="unknown",
-                dtype=Float,
+            self._tmp_qout_edges = self.make_local(
+                quantity_factory, [I_DIM, J_DIM, z_dim]
             )
 
             _, (z_domain,) = self._idx.get_origin_domain([z_dim])
diff --git a/pyfv3/stencils/c_sw.py b/pyfv3/stencils/c_sw.py
index 54baf3fd..4107a900 100644
--- a/pyfv3/stencils/c_sw.py
+++ b/pyfv3/stencils/c_sw.py
@@ -1,7 +1,7 @@
-from ndsl import Quantity, QuantityFactory, StencilFactory, orchestrate
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, I_INTERFACE_DIM, J_DIM, J_INTERFACE_DIM, K_DIM
-from ndsl.dsl.gt4py import PARALLEL, computation, horizontal, interval, region
-from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
+from ndsl.dsl.gt4py import PARALLEL, computation, horizontal, interval, region  # noqa
+from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ, I, J
 from ndsl.grid import GridData
 from ndsl.stencils import corners
 from pyfv3.stencils.d2a2c_vect import DGrid2AGrid2CGridVectors
@@ -154,7 +154,8 @@ def divergence_corner(
                 )
                 vf0 = v * dxc * 0.5 * (sin_sg3[-1, 0] + sin_sg1)
                 uf0 = u * dyc * 0.5 * (sin_sg4[0, -1] + sin_sg2)
-                divg_d = (-vf0 + uf1 - uf0) * rarea_c
+                divg_d = vf1 - vf0 + uf1 - uf0
+                divg_d = rarea_c * (divg_d - vf1)
 
             with horizontal(region[i_end + 1, j_end + 1], region[i_start, j_end + 1]):
                 vf1 = (
@@ -164,8 +165,8 @@ def divergence_corner(
                     u[-1, 0, 0] * dyc[-1, 0] * 0.5 * (sin_sg4[-1, -1] + sin_sg2[-1, 0])
                 )
                 uf0 = u * dyc * 0.5 * (sin_sg4[0, -1] + sin_sg2)
-                divg_d = (vf1 + uf1 - uf0) * rarea_c
-
+                divg_d = vf1 - vf0 + uf1 - uf0
+                divg_d = rarea_c * (divg_d + vf0)
             # ---------
 
 
@@ -373,7 +374,8 @@ def transportdelp_update_vorticity_and_kineticenergy(
             with horizontal(region[i_end + 1, :], region[i_start, :]):
                 ke = ke * sin_sg1 + v * cos_sg1 if ua > 0.0 else ke
 
-        ke = 0.5 * dt2 * (ua * ke + va * vort)
+        dt4 = 0.5 * dt2
+        ke = dt4 * (ua * ke + va * vort)
 
 
 def circulation_cgrid(
@@ -395,18 +397,18 @@ def circulation_cgrid(
     from __externals__ import i_end, i_start, j_end, j_start
 
     with computation(PARALLEL), interval(...):
-        fx = dxc * uc
-        fy = dyc * vc
-        # fx1 and fy1 are the shifted versions of fx and fy and are defined
-        # because temporaries are not allowed to be accessed with offsets in regions.
-        fx1 = dxc[0, -1] * uc[0, -1, 0]
-        fy1 = dyc[-1, 0] * vc[-1, 0, 0]
-
-        vort_c = fx1 - fx - fy1 + fy
+        fx = uc * dxc
+        fy = vc * dyc
+
+        vort_c = fx[J - 1] - fx - fy[I - 1] + fy
+
+        # Remove the extra term at the corners
+        # WEST
         with horizontal(region[i_start, j_start], region[i_start, j_end + 1]):
-            vort_c = fx1 - fx + fy
+            vort_c = vort_c + (vc[I - 1] * dyc[I - 1])
+        # EAST
         with horizontal(region[i_end + 1, j_start], region[i_end + 1, j_end + 1]):
-            vort_c = fx1 - fx - fy1
+            vort_c = vort_c - fy
 
 
 def absolute_vorticity(vort: FloatField, fC: FloatFieldIJ, rarea_c: FloatFieldIJ):
@@ -492,7 +494,7 @@ def update_y_velocity(
         velocity_c = velocity_c - tmp_flux * flux + rdyc * (ke[0, -1, 0] - ke)
 
 
-class CGridShallowWaterDynamics:
+class CGridShallowWaterDynamics(NDSLRuntime):
     """
     Fortran name is c_sw
     """
@@ -506,7 +508,8 @@ def __init__(
         grid_type: int,
         nord: int,
     ):
-        orchestrate(obj=self, config=stencil_factory.config.dace_config)
+        super().__init__(stencil_factory)
+
         self.grid_data = grid_data
         self._dord4 = True
         self._fC = self.grid_data.fC
@@ -545,20 +548,13 @@ def __init__(
             dord4=self._dord4,
         )
 
-        def make_quantity() -> Quantity:
-            return quantity_factory.zeros(
-                [I_DIM, J_DIM, K_DIM],
-                units="unknown",
-                dtype=Float,
-            )
-
         # TODO: double-check the dimensions on these, they may be incorrect
         # as they are only documentation and not used by the code
-        self._tmp_ke = make_quantity()
-        self._tmp_vort = make_quantity()
-        self._tmp_fx = make_quantity()
-        self._tmp_fx1 = make_quantity()
-        self._tmp_fx2 = make_quantity()
+        self._tmp_ke = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_vort = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_fx = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_fx1 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_fx2 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
 
         if nord > 0:
             self._divergence_corner = stencil_factory.from_dims_halo(
diff --git a/pyfv3/stencils/compute_total_energy.py b/pyfv3/stencils/compute_total_energy.py
new file mode 100644
index 00000000..1ac76a2c
--- /dev/null
+++ b/pyfv3/stencils/compute_total_energy.py
@@ -0,0 +1,164 @@
+from gt4py.cartesian.gtscript import (  # isort: skip
+    __INLINED,
+    BACKWARD,
+    FORWARD,
+    K,
+    computation,
+    interval,
+)
+
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
+from ndsl.constants import GRAV, I_DIM, J_DIM, K_INTERFACE_DIM
+from ndsl.dsl.typing import FloatField, FloatFieldIJ
+from ndsl.grid import GridData
+from pyfv3._config import DynamicalCoreConfig
+from pyfv3.stencils.moist_cv import moist_cv_nwat0_fn, moist_cv_nwat6_fn
+from pyfv3.tracers import FVTracers
+
+
+def _compute_total_energy__stencil(
+    hs: FloatFieldIJ,
+    delp: FloatField,
+    delz: FloatField,
+    qc: FloatField,
+    pt: FloatField,
+    u: FloatField,
+    v: FloatField,
+    w: FloatField,
+    tracers: FVTracers,
+    rsin2: FloatFieldIJ,
+    cosa_s: FloatFieldIJ,
+    te_2d: FloatFieldIJ,
+):
+    """
+    Dev Note: this is _very_ close to moist_cv.moist_te. The only numerical differences
+    is that the te/te_2d computation as an extra (1.+qc(i,j,k))*(1.-qd(i))
+
+    Args:
+        hs(in):
+        delp(in):
+        delz(in):
+        pt(in):
+        qc(in):
+        u(in):
+        v(in):
+        w(in):
+        tracers(in):
+        rsin2(in):
+        cosa_s(in):
+        te_2d(out):
+    """
+
+    from __externals__ import i_graupel, i_ice, i_liquid, i_rain, i_snow, i_vapor, nwat
+
+    with computation(BACKWARD), interval(-1, None):
+        te_2d = 0.0
+        phis = hs
+    with computation(BACKWARD), interval(0, -1):
+        phis = phis[K + 1] - GRAV * delz
+    with computation(FORWARD), interval(0, -1):
+        if __INLINED(nwat == 0):
+            cvm, qd = moist_cv_nwat0_fn()
+        elif __INLINED(nwat == 6):
+            cvm, qd = moist_cv_nwat6_fn(
+                tracers.A[i_vapor],
+                tracers.A[i_liquid],
+                tracers.A[i_rain],
+                tracers.A[i_snow],
+                tracers.A[i_ice],
+                tracers.A[i_graupel],
+            )
+
+        te_2d = te_2d + delp * (
+            cvm * pt * (1.0 + qc) * (1.0 - qd)
+            + 0.5
+            * (
+                phis
+                + phis[0, 0, 1]
+                + w**2.0
+                + 0.5
+                * rsin2
+                * (
+                    u**2.0
+                    + u[0, 1, 0] ** 2.0
+                    + v**2.0
+                    + v[1, 0, 0] ** 2.0
+                    - (u + u[0, 1, 0]) * (v + v[1, 0, 0]) * cosa_s
+                )
+            )
+        )
+
+
+class ComputeTotalEnergy(NDSLRuntime):
+    """Compute total energy performs the FV3-consistent
+    computation of the global total energy.
+
+    It includes the potential, internal (latent and sensible heat), kinetic terms."""
+
+    def __init__(
+        self,
+        config: DynamicalCoreConfig,
+        stencil_factory: StencilFactory,
+        quantity_factory: QuantityFactory,
+        grid_data: GridData,
+    ) -> None:
+        super().__init__(stencil_factory)
+
+        if config.hydrostatic:
+            raise NotImplementedError(
+                "Dynamics (Compute Total Energy): hydrostatic option is not implemented."
+            )
+
+        if not config.moist_phys:
+            raise NotImplementedError(
+                "Dynamics (Compute Total Energy): moist_phys=False option is not implemented."
+            )
+
+        if config.nwat not in [0, 6]:
+            raise NotImplementedError(
+                f"Compute total energy not implemented for {config.nwat} water species."
+            )
+
+        self._compute_total_energy = stencil_factory.from_dims_halo(
+            func=_compute_total_energy__stencil,
+            compute_dims=[I_DIM, J_DIM, K_INTERFACE_DIM],
+            externals={
+                "nwat": config.nwat,
+                "i_vapor": FVTracers.index("vapor"),
+                "i_liquid": FVTracers.index("liquid") if config.nwat == 6 else -1,
+                "i_rain": FVTracers.index("rain") if config.nwat == 6 else -1,
+                "i_ice": FVTracers.index("ice") if config.nwat == 6 else -1,
+                "i_snow": FVTracers.index("snow") if config.nwat == 6 else -1,
+                "i_graupel": FVTracers.index("graupel") if config.nwat == 6 else -1,
+            },
+        )
+        self._rsin2 = grid_data.rsin2
+        self._cosa_s = grid_data.cosa_s
+
+    def __call__(
+        self,
+        hs: FloatFieldIJ,
+        delp: FloatField,
+        delz: FloatField,
+        qc: FloatField,
+        pt: FloatField,
+        u: FloatField,
+        v: FloatField,
+        w: FloatField,
+        tracers: FVTracers,
+        te_2d: FloatFieldIJ,
+    ) -> None:
+        self._compute_total_energy(
+            hs=hs,
+            delp=delp,
+            delz=delz,
+            qc=qc,
+            pt=pt,
+            u=u,
+            v=v,
+            w=w,
+            tracers=tracers,
+            rsin2=self._rsin2,
+            cosa_s=self._cosa_s,
+            te_2d=te_2d,
+        )
diff --git a/pyfv3/stencils/copy_corners.py b/pyfv3/stencils/copy_corners.py
index 266627df..639ffc6b 100644
--- a/pyfv3/stencils/copy_corners.py
+++ b/pyfv3/stencils/copy_corners.py
@@ -1,8 +1,36 @@
-from ndsl import StencilFactory, orchestrate
-from ndsl.dsl.typing import FloatField
+from functools import singledispatch
 
+import dace
+import numpy as np
 
+from ndsl import NDSLRuntime, Quantity, StencilFactory
+from ndsl.dsl.typing import FloatField, FloatFieldIJ
+from ndsl.optional_imports import cupy as cp
+
+
+@singledispatch
 def corner_copy_x(field_to_copy):
+    raise NotImplementedError(f"No CopyCorners for type {type(field_to_copy)}")
+
+
+if cp is not None:
+
+    @corner_copy_x.register(cp.ndarray)
+    def _corner_copy_x_cupy(field_to_copy: cp.ndarray):
+        _blind_copy_corners_x(field_to_copy)
+
+
+@corner_copy_x.register(np.ndarray)
+def _corner_copy_x_numpy(field_to_copy: np.ndarray):
+    _blind_copy_corners_x(field_to_copy)
+
+
+@corner_copy_x.register(Quantity)
+def _corner_copy_x_quantity(field_to_copy: Quantity):
+    _blind_copy_corners_x(field_to_copy.data)
+
+
+def _blind_copy_corners_x(field_to_copy):
     """Equivalent to the copy_corners_x functions in fortran.
 
     This is written to operate on plain ndarrarys and not use the GT4Py framework.
@@ -65,7 +93,29 @@ def corner_copy_x(field_to_copy):
     field_to_copy[-2, -4] = field_to_copy[-4, -7]
 
 
+@singledispatch
 def corner_copy_y(field_to_copy):
+    raise NotImplementedError(f"No CopyCorners for type {type(field_to_copy)}")
+
+
+if cp is not None:
+
+    @corner_copy_y.register(cp.ndarray)
+    def _corner_copy_y_cupy(field_to_copy: cp.ndarray):
+        _blind_copy_corners_y(field_to_copy)
+
+
+@corner_copy_y.register(np.ndarray)
+def _corner_copy_y_nupy(field_to_copy: np.ndarray):
+    _blind_copy_corners_y(field_to_copy)
+
+
+@corner_copy_y.register(Quantity)
+def _corner_copy_y_quantity(field_to_copy: Quantity):
+    _blind_copy_corners_y(field_to_copy.data)
+
+
+def _blind_copy_corners_y(field_to_copy):
     """Equivalent to the copy_corners_y functions in fortran.
 
     This is written to operate on plain ndarrarys and not use the GT4Py framework.
@@ -128,42 +178,68 @@ def corner_copy_y(field_to_copy):
     field_to_copy[-4, -2] = field_to_copy[-7, -4]
 
 
-class CopyCornersX:
+class CopyCornersX(NDSLRuntime):
     """
     Helper-class to copy corners corresponding to the fortran function copy_corners_x
     """
 
     def __init__(self, stencil_factory: StencilFactory) -> None:
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-        )
+        super().__init__(stencil_factory)
 
         if stencil_factory.grid_indexing.n_halo != 3:
             raise NotImplementedError(
                 "Corner-Copy only implemented for exactly 3 Halo-Points"
             )
 
+        self._is_orch = stencil_factory.backend.is_orchestrated()
+
+    def _internal_corners_copy_3D(self, field: FloatFieldIJ):
+        _blind_copy_corners_x(field) if self._is_orch else corner_copy_x(field)
+
+    def _internal_corners_copy(self, field: FloatField, k: int):
+        if self._is_orch:
+            _blind_copy_corners_x(field[:, :, k])
+        else:
+            corner_copy_x(field)
+
     def __call__(self, field: FloatField):
-        corner_copy_x(field)
+        self._internal_corners_copy_3D(field)
 
+    def nord(self, field: FloatField, nord: Quantity):
+        for k in dace.map[0 : nord.shape[0]]:
+            if nord[k] > 0:
+                self._internal_corners_copy(field, k)
 
-class CopyCornersY:
+
+class CopyCornersY(NDSLRuntime):
     """
     Helper-class to copy corners corresponding to the fortran function
     copy_corners_y
     """
 
     def __init__(self, stencil_factory: StencilFactory) -> None:
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-        )
+        super().__init__(stencil_factory)
 
         if stencil_factory.grid_indexing.n_halo != 3:
             raise NotImplementedError(
                 "Corner-Copy only implemented for exactly 3 Halo-Points"
             )
 
+        self._is_orch = stencil_factory.backend.is_orchestrated()
+
+    def _internal_corners_copy_3D(self, field: FloatFieldIJ):
+        _blind_copy_corners_y(field) if self._is_orch else corner_copy_y(field)
+
+    def _internal_corners_copy(self, field: FloatField, k: int):
+        if self._is_orch:
+            _blind_copy_corners_y(field[:, :, k])
+        else:
+            corner_copy_y(field[:, :, k])
+
     def __call__(self, field: FloatField):
-        corner_copy_y(field)
+        self._internal_corners_copy_3D(field)
+
+    def nord(self, field: FloatField, nord: Quantity):
+        for k in dace.map[0 : nord.shape[0]]:
+            if nord[k] > 0:
+                self._internal_corners_copy(field, k)
diff --git a/pyfv3/stencils/corners.py b/pyfv3/stencils/corners.py
new file mode 100644
index 00000000..68290e6d
--- /dev/null
+++ b/pyfv3/stencils/corners.py
@@ -0,0 +1,161 @@
+from ndsl import NDSLRuntime, StencilFactory
+from ndsl.dsl.typing import FloatField
+
+
+def corner_copy_x(field_to_copy):
+    """Equivalent to the copy_corners_x functions in fortran.
+
+    This is written to operate on plain ndarrarys and not use the GT4Py framework.
+    This choice was made because we've seen a lot of performance left on the table using
+    orchestration without explicitly describing the operations but rather have full 3d-
+    sweeps with conditionals.
+    Since DaCe can handle (simple) operations on ndarrays directly this gives us a more
+    explicit entrypoint to the language and more optimization-potential.
+
+    Args:
+        field_to_copy (ndarray): field to apply the corner copy on.
+            This is explicitly not type-hinted for orchestration
+    """
+    field_to_copy[0, 0] = field_to_copy[0, 5]
+    field_to_copy[0, 1] = field_to_copy[1, 5]
+    field_to_copy[0, 2] = field_to_copy[2, 5]
+
+    field_to_copy[1, 0] = field_to_copy[0, 4]
+    field_to_copy[1, 1] = field_to_copy[1, 4]
+    field_to_copy[1, 2] = field_to_copy[2, 4]
+
+    field_to_copy[2, 0] = field_to_copy[0, 3]
+    field_to_copy[2, 1] = field_to_copy[1, 3]
+    field_to_copy[2, 2] = field_to_copy[2, 3]
+
+    field_to_copy[0, -4] = field_to_copy[2, -7]
+    field_to_copy[0, -3] = field_to_copy[1, -7]
+    field_to_copy[0, -2] = field_to_copy[0, -7]
+
+    field_to_copy[1, -4] = field_to_copy[2, -6]
+    field_to_copy[1, -3] = field_to_copy[1, -6]
+    field_to_copy[1, -2] = field_to_copy[0, -6]
+
+    field_to_copy[2, -4] = field_to_copy[2, -5]
+    field_to_copy[2, -3] = field_to_copy[1, -5]
+    field_to_copy[2, -2] = field_to_copy[0, -5]
+
+    field_to_copy[-4, 0] = field_to_copy[-2, 3]
+    field_to_copy[-4, 1] = field_to_copy[-3, 3]
+    field_to_copy[-4, 2] = field_to_copy[-4, 3]
+
+    field_to_copy[-3, 0] = field_to_copy[-2, 4]
+    field_to_copy[-3, 1] = field_to_copy[-3, 4]
+    field_to_copy[-3, 2] = field_to_copy[-4, 4]
+
+    field_to_copy[-2, 0] = field_to_copy[-2, 5]
+    field_to_copy[-2, 1] = field_to_copy[-3, 5]
+    field_to_copy[-2, 2] = field_to_copy[-4, 5]
+
+    field_to_copy[-4, -2] = field_to_copy[-2, -5]
+    field_to_copy[-4, -3] = field_to_copy[-3, -5]
+    field_to_copy[-4, -4] = field_to_copy[-4, -5]
+
+    field_to_copy[-3, -2] = field_to_copy[-2, -6]
+    field_to_copy[-3, -3] = field_to_copy[-3, -6]
+    field_to_copy[-3, -4] = field_to_copy[-4, -6]
+
+    field_to_copy[-2, -2] = field_to_copy[-2, -7]
+    field_to_copy[-2, -3] = field_to_copy[-3, -7]
+    field_to_copy[-2, -4] = field_to_copy[-4, -7]
+
+
+def corner_copy_y(field_to_copy):
+    """Equivalent to the copy_corners_y functions in fortran.
+
+    This is written to operate on plain ndarrarys and not use the GT4Py framework.
+    This choice was made because we've seen a lot of performance left on the table using
+    orchestration without explicitly describing the operations but rather have full 3d-
+    sweeps with conditionals.
+    Since DaCe can handle (simple) operations on ndarrays directly this gives us a more
+    explicit entrypoint to the language and more optimization-potential.
+
+    Args:
+        field_to_copy (ndarray): field to apply the corner copy on.
+            This is explicitly not type-hinted for orchestration
+    """
+    field_to_copy[0, 0] = field_to_copy[5, 0]
+    field_to_copy[1, 0] = field_to_copy[5, 1]
+    field_to_copy[2, 0] = field_to_copy[5, 2]
+
+    field_to_copy[0, 1] = field_to_copy[4, 0]
+    field_to_copy[1, 1] = field_to_copy[4, 1]
+    field_to_copy[2, 1] = field_to_copy[4, 2]
+
+    field_to_copy[0, 2] = field_to_copy[3, 0]
+    field_to_copy[1, 2] = field_to_copy[3, 1]
+    field_to_copy[2, 2] = field_to_copy[3, 2]
+
+    field_to_copy[-4, 0] = field_to_copy[-7, 2]
+    field_to_copy[-3, 0] = field_to_copy[-7, 1]
+    field_to_copy[-2, 0] = field_to_copy[-7, 0]
+
+    field_to_copy[-4, 1] = field_to_copy[-6, 2]
+    field_to_copy[-3, 1] = field_to_copy[-6, 1]
+    field_to_copy[-2, 1] = field_to_copy[-6, 0]
+
+    field_to_copy[-4, 2] = field_to_copy[-5, 2]
+    field_to_copy[-3, 2] = field_to_copy[-5, 1]
+    field_to_copy[-2, 2] = field_to_copy[-5, 0]
+
+    field_to_copy[0, -2] = field_to_copy[5, -2]
+    field_to_copy[0, -3] = field_to_copy[4, -2]
+    field_to_copy[0, -4] = field_to_copy[3, -2]
+
+    field_to_copy[1, -2] = field_to_copy[5, -3]
+    field_to_copy[1, -3] = field_to_copy[4, -3]
+    field_to_copy[1, -4] = field_to_copy[3, -3]
+
+    field_to_copy[2, -2] = field_to_copy[5, -4]
+    field_to_copy[2, -3] = field_to_copy[4, -4]
+    field_to_copy[2, -4] = field_to_copy[3, -4]
+
+    field_to_copy[-2, -4] = field_to_copy[-5, -2]
+    field_to_copy[-2, -3] = field_to_copy[-6, -2]
+    field_to_copy[-2, -2] = field_to_copy[-7, -2]
+
+    field_to_copy[-3, -4] = field_to_copy[-5, -3]
+    field_to_copy[-3, -3] = field_to_copy[-6, -3]
+    field_to_copy[-3, -2] = field_to_copy[-7, -3]
+
+    field_to_copy[-4, -4] = field_to_copy[-5, -4]
+    field_to_copy[-4, -3] = field_to_copy[-6, -4]
+    field_to_copy[-4, -2] = field_to_copy[-7, -4]
+
+
+class CopyCornersX(NDSLRuntime):
+    """
+    Helper-class to copy corners corresponding to the fortran function copy_corners_x
+    """
+
+    def __init__(self, stencil_factory: StencilFactory) -> None:
+        super().__init__(stencil_factory)
+        if stencil_factory.grid_indexing.n_halo != 3:
+            raise NotImplementedError(
+                "Corner-Copy only implemented for exactly 3 Halo-Points"
+            )
+
+    def __call__(self, field: FloatField):
+        corner_copy_x(field)
+
+
+class CopyCornersY(NDSLRuntime):
+    """
+    Helper-class to copy corners corresponding to the fortran function
+    copy_corners_y
+    """
+
+    def __init__(self, stencil_factory: StencilFactory) -> None:
+        super().__init__(stencil_factory)
+        if stencil_factory.grid_indexing.n_halo != 3:
+            raise NotImplementedError(
+                "Corner-Copy only implemented for exactly 3 Halo-Points"
+            )
+
+    def __call__(self, field: FloatField):
+        corner_copy_y(field)
diff --git a/pyfv3/stencils/d2a2c_vect.py b/pyfv3/stencils/d2a2c_vect.py
index d0bc0ee7..5da48979 100644
--- a/pyfv3/stencils/d2a2c_vect.py
+++ b/pyfv3/stencils/d2a2c_vect.py
@@ -1,17 +1,17 @@
-from ndsl import QuantityFactory, StencilFactory, orchestrate
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_DIM
 from ndsl.dsl.gt4py import PARALLEL, computation
 from ndsl.dsl.gt4py import function as gtfunction
 from ndsl.dsl.gt4py import horizontal, interval, region
-from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
+from ndsl.dsl.typing import NDSL_GLOBAL_PRECISION, Float, FloatField, FloatFieldIJ
 from ndsl.grid import GridData
 from ndsl.stencils import corners
 from pyfv3.stencils.a2b_ord4 import a1, a2, lagrange_x_func, lagrange_y_func
 
 
-c1 = -2.0 / 14.0
-c2 = 11.0 / 14.0
-c3 = 5.0 / 14.0
+c1 = Float(-2.0) / Float(14.0)
+c2 = Float(11.0) / Float(14.0)
+c3 = Float(5.0) / Float(14.0)
 OFFSET = 2
 
 
@@ -140,7 +140,7 @@ def east_west_edges(
             uc = utc * sin_sg3[-1, 0] if utc > 0 else utc * sin_sg1
 
         with horizontal(region[i_end + 2, local_js - 1 : local_je + 2]):
-            uc = vol_conserv_cubic_interp_func_x_rev(utmp)
+            uc = vol_conserv_cubic_interp_func_x_rev_2(utmp)
 
         with horizontal(region[i_end, local_js - 1 : local_je + 2]):
             utc = contravariant(uc, v, cosa_u, rsin_u)
@@ -300,6 +300,13 @@ def vol_conserv_cubic_interp_func_x_rev(u):
     return c1 * u[1, 0, 0] + c2 * u + c3 * u[-1, 0, 0]
 
 
+@gtfunction
+def vol_conserv_cubic_interp_func_x_rev_2(u):
+    """Series order is reversed compared to original
+    vol_conserv_cubic_interp_func_x_rev to match Fortran"""
+    return c3 * u[-1, 0, 0] + c2 * u + c1 * u[1, 0, 0]
+
+
 @gtfunction
 def vol_conserv_cubic_interp_func_y(v):
     return c1 * v[0, -2, 0] + c2 * v[0, -1, 0] + c3 * v
@@ -361,21 +368,23 @@ def vc_y_edge1(
 def edge_interpolate4_x(ua, dxa):
     t1 = dxa[-2, 0] + dxa[-1, 0]
     t2 = dxa[0, 0] + dxa[1, 0]
-    n1 = (t1 + dxa[-1, 0]) * ua[-1, 0, 0] - dxa[-1, 0] * ua[-2, 0, 0]
-    n2 = (t1 + dxa[0, 0]) * ua[0, 0, 0] - dxa[0, 0] * ua[1, 0, 0]
-    return 0.5 * (n1 / t1 + n2 / t2)
+    return 0.5 * (
+        ((t1 + dxa[-1, 0]) * ua[-1, 0, 0] - dxa[-1, 0] * ua[-2, 0, 0]) / t1
+        + ((t1 + dxa[0, 0]) * ua[0, 0, 0] - dxa[0, 0] * ua[1, 0, 0]) / t2
+    )
 
 
 @gtfunction
 def edge_interpolate4_y(va, dya):
     t1 = dya[0, -2] + dya[0, -1]
     t2 = dya[0, 0] + dya[0, 1]
-    n1 = (t1 + dya[0, -1]) * va[0, -1, 0] - dya[0, -1] * va[0, -2, 0]
-    n2 = (t1 + dya[0, 0]) * va[0, 0, 0] - dya[0, 0] * va[0, 1, 0]
-    return 0.5 * (n1 / t1 + n2 / t2)
+    return 0.5 * (
+        ((t1 + dya[0, -1]) * va[0, -1, 0] - dya[0, -1] * va[0, -2, 0]) / t1
+        + ((t1 + dya[0, 0]) * va[0, 0, 0] - dya[0, 0] * va[0, 1, 0]) / t2
+    )
 
 
-class DGrid2AGrid2CGridVectors:
+class DGrid2AGrid2CGridVectors(NDSLRuntime):
     """
     Fortran name d2a2c_vect
     """
@@ -389,11 +398,11 @@ def __init__(
         grid_type: int,
         dord4: bool,
     ):
+        super().__init__(stencil_factory)
+
         if grid_type not in [0, 4]:
             raise NotImplementedError(f"unimplemented grid_type {grid_type}")
 
-        orchestrate(obj=self, config=stencil_factory.config.dace_config)
-
         grid_indexing = stencil_factory.grid_indexing
         self._cosa_s = grid_data.cosa_s
         self._cosa_u = grid_data.cosa_u
@@ -409,7 +418,7 @@ def __init__(
         self._sin_sg4 = grid_data.sin_sg4
         self._grid_type = grid_type
 
-        self._big_number = 1e30  # 1e8 if 32 bit
+        self._big_number = Float(1e30) if NDSL_GLOBAL_PRECISION == 64 else Float(1e8)
         nx = grid_indexing.iec + 1  # grid.npx + 2
         ny = grid_indexing.jec + 1  # grid.npy + 2
         i1 = grid_indexing.isc - 1
@@ -448,15 +457,15 @@ def __init__(
             jfirst = grid_indexing.jsc - 1
             jlast = grid_indexing.jec + 2
 
-        self._utmp = quantity_factory.zeros(
+        self._utmp = self.make_local(
+            quantity_factory,
             [I_DIM, J_DIM, K_DIM],
             units="m/s",
-            dtype=Float,
         )
-        self._vtmp = quantity_factory.zeros(
+        self._vtmp = self.make_local(
+            quantity_factory,
             [I_DIM, J_DIM, K_DIM],
             units="m/s",
-            dtype=Float,
         )
 
         if (grid_type < 3) and (not nested):
@@ -502,9 +511,6 @@ def __init__(
             domain=(ie2 - is2 + 1, je2 - js2 + 1, grid_indexing.domain[2]),
         )
 
-        origin = grid_indexing.origin_full()
-        domain = grid_indexing.domain_full()
-        ax_offsets = grid_indexing.axis_offsets(origin, domain)
         if npt == 0:
             d2a2c_avg_offset = -1
         else:
@@ -640,15 +646,6 @@ def __call__(self, uc, vc, u, v, ua, va, utc, vtc):
             va,
         )
 
-        self._ut_main(
-            self._utmp,
-            uc,
-            v,
-            self._cosa_u,
-            self._rsin_u,
-            utc,
-        )
-
         if self._grid_type < 3:
             self._east_west_edges(
                 u,
@@ -664,6 +661,15 @@ def __call__(self, uc, vc, u, v, ua, va, utc, vtc):
                 self._dxa,
             )
 
+        self._ut_main(
+            self._utmp,
+            uc,
+            v,
+            self._cosa_u,
+            self._rsin_u,
+            utc,
+        )
+
         # Ydir:
         self._fill_corners_y(
             self._utmp,
diff --git a/pyfv3/stencils/d_sw.py b/pyfv3/stencils/d_sw.py
index e9eba6a7..ac114427 100644
--- a/pyfv3/stencils/d_sw.py
+++ b/pyfv3/stencils/d_sw.py
@@ -1,15 +1,14 @@
 from collections.abc import Mapping
 
-from ndsl import Quantity, QuantityFactory, StencilFactory, orchestrate
+from ndsl import NDSLRuntime, Quantity, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, I_INTERFACE_DIM, J_DIM, J_INTERFACE_DIM, K_DIM
-from ndsl.dsl.gt4py import PARALLEL, computation
+from ndsl.dsl.gt4py import PARALLEL, I, J, computation
 from ndsl.dsl.gt4py import function as gtfunction
 from ndsl.dsl.gt4py import horizontal, interval, region
-from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ, FloatFieldK
+from ndsl.dsl.typing import Float, FloatField, FloatField64, FloatFieldIJ, FloatFieldK
 from ndsl.grid import DampingCoefficients, GridData
 from pyfv3._config import DGridShallowWaterLagrangianDynamicsConfig
 from pyfv3.stencils import delnflux
-from pyfv3.stencils.d2a2c_vect import contravariant
 from pyfv3.stencils.delnflux import DelnFluxNoSG
 from pyfv3.stencils.divergence_damping import DivergenceDamping
 from pyfv3.stencils.fvtp2d import FiniteVolumeTransport
@@ -21,14 +20,14 @@
 
 from gt4py.cartesian.gtscript import __INLINED  # isort:skip
 
-dcon_threshold = 1e-5
+dcon_threshold = Float(1e-5)
 
 
 def flux_capacitor(
-    cx: FloatField,
-    cy: FloatField,
-    xflux: FloatField,
-    yflux: FloatField,
+    cx: FloatField64,
+    cy: FloatField64,
+    xflux: FloatField64,
+    yflux: FloatField64,
     crx_adv: FloatField,
     cry_adv: FloatField,
     fx: FloatField,
@@ -87,6 +86,8 @@ def heat_diss(
         damp_w (in):
         ke_bg (in):
     """
+    from __externals__ import do_stochastic_ke_backscatter
+
     with computation(PARALLEL), interval(...):
         heat_source = 0.0
         diss_est = 0.0
@@ -94,7 +95,8 @@ def heat_diss(
             dd8 = ke_bg * abs(dt)
             dw = (fx2 - fx2[1, 0, 0] + fy2 - fy2[0, 1, 0]) * rarea
             heat_source = dd8 - dw * (w + 0.5 * dw)
-            diss_est = heat_source
+            if __INLINED(do_stochastic_ke_backscatter):
+                diss_est = heat_source
 
 
 @gtfunction
@@ -195,6 +197,16 @@ def apply_pt_delp_fluxes_stencil_defn(
         pt, delp = apply_pt_delp_fluxes(gx, gy, rarea, fx, fy, pt, delp)
 
 
+def delp_increment_accumulation(
+    dpx: FloatField64,
+    fx: FloatField,
+    fy: FloatField,
+    rarea: FloatFieldIJ,
+):
+    with computation(PARALLEL), interval(...):
+        dpx = dpx + ((fx - fx[1, 0, 0]) + (fy - fy[0, 1, 0])) * rarea
+
+
 def compute_kinetic_energy(
     vc: FloatField,
     uc: FloatField,
@@ -236,20 +248,22 @@ def compute_kinetic_energy(
     from __externals__ import grid_type
 
     with computation(PARALLEL), interval(...):
+        dt4 = 0.25 * dt
+        dt5 = 0.5 * dt
         if __INLINED(grid_type < 3):
             ub_contra, vb_contra = interpolate_uc_vc_to_cell_corners(
-                uc, vc, cosa, rsina, uc_contra, vc_contra
+                uc, vc, cosa, rsina, uc_contra, vc_contra, dt4, dt5
             )
         else:
-            ub_contra = 0.5 * (uc[0, -1, 0] + uc)
-            vb_contra = 0.5 * (vc[-1, 0, 0] + vc)
+            ub_contra = dt5 * (uc[0, -1, 0] + uc)
+            vb_contra = dt5 * (vc[-1, 0, 0] + vc)
         advected_v = advect_v_along_y(v, vb_contra, rdy=rdy, dy=dy, dya=dya, dt=dt)
         advected_u = advect_u_along_x(u, ub_contra, rdx=rdx, dx=dx, dxa=dxa, dt=dt)
         # makes sure the kinetic energy part of the governing equation is computed
         # the same way as the vorticity flux part (in terms of time splitting)
         # to avoid a Hollingsworth-Kallberg instability
-        dt_kinetic_energy_on_cell_corners = (
-            0.5 * dt * (ub_contra * advected_u + vb_contra * advected_v)
+        dt_kinetic_energy_on_cell_corners = 0.5 * (
+            ub_contra * advected_u + vb_contra * advected_v
         )
         dt_kinetic_energy_on_cell_corners = all_corners_ke(
             dt_kinetic_energy_on_cell_corners, u, v, uc_contra, vc_contra, dt
@@ -321,11 +335,9 @@ def compute_vorticity(
         # cell-mean vorticity is equal to the circulation around the gridcell
         # divided by the area of the gridcell. It isn't exactly true that
         # area = dx * dy, so the form below is necessary to get an exact result.
-        rdy_tmp = rarea * dx
-        rdx_tmp = rarea * dy
-        vorticity = (u - u[0, 1, 0] * dx[0, 1] / dx) * rdy_tmp + (
-            v[1, 0, 0] * dy[1, 0] / dy - v
-        ) * rdx_tmp
+        ut = v * dy
+        vt = u * dx
+        vorticity = rarea * (vt - vt[J + 1] - ut + ut[I + 1])
 
 
 def adjust_w_and_qcon(
@@ -368,9 +380,7 @@ def vort_differencing(
     from __externals__ import local_ie, local_is, local_je, local_js
 
     with computation(PARALLEL), interval(...):
-        # TODO: this should likely be dcon[k] rather than dcon[0] so that this
-        # can be turned on and off per-layer
-        if dcon[0] > dcon_threshold:
+        if dcon > dcon_threshold:
             # Creating a gtscript function for the ub/vb computation
             # results in an "NotImplementedError" error for Jenkins
             # Inlining the ub/vb computation in this stencil resolves the Jenkins error
@@ -532,7 +542,6 @@ def heat_source_from_vorticity_damping(
             to explicitly damp and convert into heat.
     """
     from __externals__ import (  # noqa (see below)
-        d_con,
         do_stochastic_ke_backscatter,
         local_ie,
         local_is,
@@ -626,24 +635,15 @@ def set_low_kvals(col: Mapping[str, Quantity], k):
 
 # For the column namelist at a specific k-level
 # set the vorticity parameters if do_vort_damp is true
-def vorticity_damping_option_FV3GFS(column, k, do_vort_damp):
+def vorticity_damping_option(column, k, do_vort_damp):
     if do_vort_damp:
         column["nord_v"].view[k] = 0
         column["damp_vt"].view[k] = 0.5 * column["d2_divg"].view[k]
 
 
-def vorticity_damping_option_GEOS(column, k, do_vort_damp):
-    # GEOS does not set damp_vt
-    if do_vort_damp:
-        column["nord_v"].view[k] = 0
-
-
 def lowest_kvals(column, k, do_vort_damp):
     set_low_kvals(column, k)
-    if IS_GEOS:
-        vorticity_damping_option_GEOS(column, k, do_vort_damp)
-    else:
-        vorticity_damping_option_FV3GFS(column, k, do_vort_damp)
+    vorticity_damping_option(column, k, do_vort_damp)
 
 
 def get_column_namelist(
@@ -717,16 +717,16 @@ def get_column_namelist(
 
     # Check that the format of nord_col is N 0's then non-zero values
     # all the way to the top.
-    # Check upper values are all the same.
-    non_zero_k = -1
-    non_zero_v = -1
-    for k, v in enumerate(col["nord_v"].view[:]):
+    # Non-zeros values are all the same.
+    first_non_zero_index = -1
+    first_non_zero_value = -1
+    for i, v in enumerate(col["nord_v"].view[:]):
         if v != 0:
-            non_zero_k = k
-            non_zero_v = v
+            first_non_zero_index = i
+            first_non_zero_value = v
             break
-    for v in range(non_zero_k, col["nord_v"].view.extent[0]):
-        if col["nord_v"].view[v] != non_zero_v:
+    for v in range(first_non_zero_index, col["nord_v"].view.extent[0]):
+        if col["nord_v"].view[v] != first_non_zero_value:
             raise RuntimeError(
                 f"D_SW.column is not homogeneous in values: {col['nord_v'].view[:]}"
             )
@@ -736,45 +736,41 @@ def get_column_namelist(
 
 @gtfunction
 def interpolate_uc_vc_to_cell_corners(
-    uc_cov, vc_cov, cosa, rsina, uc_contra, vc_contra
+    uc_cov, vc_cov, cosa, rsina, uc_contra, vc_contra, dt4, dt5
 ):
     """
     Convert covariant C-grid winds to contravariant B-grid (cell-corner) winds.
     """
     from __externals__ import i_end, i_start, j_end, j_start
 
-    # In the original Fortran, this routine was given dt4 (0.25 * dt)
-    # and dt5 (0.5 * dt), and its outputs were wind times timestep. This has
-    # been refactored so the timestep is later explicitly multiplied, when
-    # the wind is integrated forward in time.
-    # TODO: ask Lucas why we interpolate then convert to contravariant in tile center,
-    # but convert to contravariant and then interpolate on tile edges.
-    ub_cov = 0.5 * (uc_cov[0, -1, 0] + uc_cov)
-    vb_cov = 0.5 * (vc_cov[-1, 0, 0] + vc_cov)
-    ub_contra = contravariant(ub_cov, vb_cov, cosa, rsina)
-    vb_contra = contravariant(vb_cov, ub_cov, cosa, rsina)
-    # ASSUME : if __INLINED(namelist.grid_type < 3):
+    # Orders matter because corners take the last edge computation values
+    # Center domain
+    ub = dt5 * (uc_cov[J - 1] + uc_cov - (vc_cov[I - 1] + vc_cov) * cosa) * rsina
+    vb = dt5 * (vc_cov[I - 1] + vc_cov - (uc_cov[J - 1] + uc_cov) * cosa) * rsina
+    # UB - Orders matter because corners take the last edge computation values
+    # North/South edge
     with horizontal(region[:, j_start], region[:, j_end + 1]):
-        ub_contra = 0.25 * (
-            -uc_contra[0, -2, 0]
-            + 3.0 * (uc_contra[0, -1, 0] + uc_contra)
-            - uc_contra[0, 1, 0]
+        ub = dt4 * (
+            -uc_contra[J - 2] + 3.0 * (uc_contra[J - 1] + uc_contra) - uc_contra[J + 1]
         )
+    # East/West
     with horizontal(region[i_start, :], region[i_end + 1, :]):
-        ub_contra = 0.5 * (uc_contra[0, -1, 0] + uc_contra)
+        ub = dt5 * (uc_contra[J - 1] + uc_contra)
+
+    # VB - Orders matter because corners take the last edge computation values
+    # North/South edge
     with horizontal(region[i_start, :], region[i_end + 1, :]):
-        vb_contra = 0.25 * (
-            -vc_contra[-2, 0, 0]
-            + 3.0 * (vc_contra[-1, 0, 0] + vc_contra)
-            - vc_contra[1, 0, 0]
+        vb = dt4 * (
+            -vc_contra[I - 2] + 3.0 * (vc_contra[I - 1] + vc_contra) - vc_contra[I + 1]
         )
+    # East/West
     with horizontal(region[:, j_start], region[:, j_end + 1]):
-        vb_contra = 0.5 * (vc_contra[-1, 0, 0] + vc_contra)
+        vb = dt5 * (vc_contra[I - 1] + vc_contra)
 
-    return ub_contra, vb_contra
+    return ub, vb
 
 
-class DGridShallowWaterLagrangianDynamics:
+class DGridShallowWaterLagrangianDynamics(NDSLRuntime):
     """
     Fortran name is the d_sw subroutine
     """
@@ -790,7 +786,8 @@ def __init__(
         stretched_grid: bool,
         config: DGridShallowWaterLagrangianDynamicsConfig,
     ):
-        orchestrate(obj=self, config=stencil_factory.config.dace_config)
+        super().__init__(stencil_factory)
+
         self.grid_data = grid_data
         self._f0 = self.grid_data.fC_agrid
         self._d_con = config.d_con
@@ -836,34 +833,36 @@ def __init__(
                 "D-Grid Shallow Water Lagrangian Dynamics (D_SW): Hydrostatic is not implemented"
             )
 
-        def make_quantity():
-            return quantity_factory.zeros(
-                [I_DIM, J_DIM, K_DIM],
-                units="unknown",
-                dtype=Float,
-            )
-
-        self._tmp_heat_s = make_quantity()
-        self._tmp_diss_e = make_quantity()
-        self._vort_x_delta = make_quantity()
-        self._vort_y_delta = make_quantity()
-        self._dt_kinetic_energy_on_cell_corners = make_quantity()
-        self._abs_vorticity_agrid = make_quantity()
-        self._damped_rel_vorticity_agrid = make_quantity()
-        self._uc_contra = make_quantity()
-        self._vc_contra = make_quantity()
-        self._tmp_ut = make_quantity()
-        self._tmp_vt = make_quantity()
-        self._tmp_fx = make_quantity()
-        self._tmp_fy = make_quantity()
-        self._tmp_gx = make_quantity()
-        self._tmp_gy = make_quantity()
-        self._tmp_dw = make_quantity()
-        self._tmp_wk = make_quantity()
-        self._vorticity_agrid = make_quantity()
-        self._vorticity_bgrid_damped = make_quantity()
-        self._tmp_fx2 = make_quantity()
-        self._tmp_fy2 = make_quantity()
+        # locals
+        self._tmp_heat_s = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_diss_e = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._vort_x_delta = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._vort_y_delta = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._dt_kinetic_energy_on_cell_corners = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_DIM]
+        )
+        self._abs_vorticity_agrid = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_DIM]
+        )
+        self._damped_rel_vorticity_agrid = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_DIM]
+        )
+        self._uc_contra = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._vc_contra = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_ut = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_vt = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_fx = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_fy = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_gx = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_gy = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_dw = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_wk = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._vorticity_agrid = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._vorticity_bgrid_damped = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_DIM]
+        )
+        self._tmp_fx2 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._tmp_fy2 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
         self._column_namelist = column_namelist
 
         self.delnflux_nosg_w = DelnFluxNoSG(
@@ -986,11 +985,14 @@ def make_quantity():
         self._heat_diss_stencil = stencil_factory.from_dims_halo(
             func=heat_diss,
             compute_dims=[I_DIM, J_DIM, K_DIM],
+            externals={
+                "do_stochastic_ke_backscatter": config.do_skeb,
+            },
         )
         self._heat_source_from_vorticity_damping_stencil = (
             stencil_factory.from_dims_halo(
                 func=heat_source_from_vorticity_damping,
-                compute_dims=[I_INTERFACE_DIM, J_INTERFACE_DIM, K_DIM],
+                compute_dims=[I_DIM, J_DIM, K_DIM],
                 externals={
                     "do_stochastic_ke_backscatter": config.do_skeb,
                     "d_con": config.d_con,
@@ -1024,33 +1026,38 @@ def make_quantity():
             da_min=damping_coefficients.da_min_c,
             nord=self._column_namelist["nord_w"],
         )
+        self._accumulate_delp = stencil_factory.from_dims_halo(
+            func=delp_increment_accumulation,
+            compute_dims=[I_DIM, J_DIM, K_DIM],
+        )
 
     def __call__(
         self,
-        delpc,
-        delp,
-        pt,
-        u,
-        v,
-        w,
-        uc,
-        vc,
-        ua,
-        va,
-        divgd,
-        mfx,
-        mfy,
-        cx,
-        cy,
-        crx,
-        cry,
-        xfx,
-        yfx,
-        q_con,
-        zh,
-        heat_source,
-        diss_est,
-        dt,
+        delpc: FloatField,
+        delp: FloatField,
+        pt: FloatField,
+        u: FloatField,
+        v: FloatField,
+        w: FloatField,
+        uc: FloatField,
+        vc: FloatField,
+        ua: FloatField,
+        va: FloatField,
+        divgd: FloatField,
+        mfx: FloatField64,
+        mfy: FloatField64,
+        cx: FloatField64,
+        cy: FloatField64,
+        dpx: FloatField64,
+        crx: FloatField,
+        cry: FloatField,
+        xfx: FloatField,
+        yfx: FloatField,
+        q_con: FloatField,
+        zh: FloatField,
+        heat_source: FloatField,
+        diss_est: FloatField,
+        dt: Float,
     ):
         """
         D-Grid shallow water routine, peforms a full-timestep advance
@@ -1078,6 +1085,7 @@ def __call__(
             mfy (inout): accumulated y mass flux
             cx (inout): accumulated Courant number in the x direction
             cy (inout): accumulated Courant number in the y direction
+            dpx (inout): accumulated delp export for Dry Mass Roundoff Control
             crx (out): local courant number in the x direction
             cry (out): local courant number in the y direction
             xfx (out): flux of area in x-direction, in units of m^2
@@ -1216,6 +1224,13 @@ def __call__(
         self._adjust_w_and_qcon_stencil(
             w, delp, self._tmp_dw, q_con, self._column_namelist["damp_w"]
         )
+
+        self._accumulate_delp(
+            dpx=dpx,
+            fx=self._tmp_fx,
+            fy=self._tmp_fy,
+            rarea=self.grid_data.rarea,
+        )
         # at this point, pt, delp, w and q_con have been stepped forward in time
         # the rest of this function updates the winds
         self._compute_kinetic_energy(
diff --git a/pyfv3/stencils/del2cubed.py b/pyfv3/stencils/del2cubed.py
index f06476f1..f4f61852 100644
--- a/pyfv3/stencils/del2cubed.py
+++ b/pyfv3/stencils/del2cubed.py
@@ -1,10 +1,11 @@
 import dace
+import numpy as np
 
-from ndsl import QuantityFactory, StencilFactory, orchestrate
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, I_INTERFACE_DIM, J_DIM, J_INTERFACE_DIM, K_DIM
 from ndsl.dsl.gt4py import PARALLEL, computation, horizontal, interval, region
 from ndsl.dsl.stencil import get_stencils_with_varied_bounds
-from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ, cast_to_index3d
+from ndsl.dsl.typing import FloatField, FloatFieldIJ, cast_to_index3d
 from ndsl.grid import DampingCoefficients
 from ndsl.stencils.basic_operations import copy
 from pyfv3.stencils.copy_corners import CopyCornersX, CopyCornersY
@@ -69,13 +70,13 @@ def corner_fill(q_in: FloatField, q_out: FloatField):
 # Q update stencil
 # ------------------
 def update_q(
-    q: FloatField, rarea: FloatFieldIJ, fx: FloatField, fy: FloatField, cd: Float
+    q: FloatField, rarea: FloatFieldIJ, fx: FloatField, fy: FloatField, cd: np.float64
 ):
     with computation(PARALLEL), interval(...):
         q += cd * rarea * (fx - fx[1, 0, 0] + fy - fy[0, 1, 0])
 
 
-class HyperdiffusionDamping:
+class HyperdiffusionDamping(NDSLRuntime):
     """
     Fortran name is del2_cubed
     """
@@ -90,9 +91,10 @@ def __init__(
     ):
         """
         Args:
-            grid: pyFV3 grid object
+            grid: pyfv3 grid object
         """
-        orchestrate(obj=self, config=stencil_factory.config.dace_config)
+        super().__init__(stencil_factory)
+
         grid_indexing = stencil_factory.grid_indexing
         self._del6_u = damping_coefficients.del6_u
         self._del6_v = damping_coefficients.del6_v
@@ -100,21 +102,9 @@ def __init__(
 
         # the units of these temporaries are relative to the input units,
         # so they are undefined
-        self._fx = quantity_factory.zeros(
-            dims=[I_INTERFACE_DIM, J_DIM, K_DIM],
-            units="undefined",
-            dtype=Float,
-        )
-        self._fy = quantity_factory.zeros(
-            dims=[I_DIM, J_INTERFACE_DIM, K_DIM],
-            units="undefined",
-            dtype=Float,
-        )
-        self._q = quantity_factory.zeros(
-            dims=[I_DIM, J_DIM, K_DIM],
-            units="undefined",
-            dtype=Float,
-        )
+        self._fx = self.make_local(quantity_factory, [I_INTERFACE_DIM, J_DIM, K_DIM])
+        self._fy = self.make_local(quantity_factory, [I_DIM, J_INTERFACE_DIM, K_DIM])
+        self._q = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
 
         self._corner_fill = stencil_factory.from_dims_halo(
             func=corner_fill,
@@ -166,7 +156,7 @@ def __init__(
             update_q, origins, domains, stencil_factory=stencil_factory
         )
 
-    def __call__(self, qdel: FloatField, cd: Float):
+    def __call__(self, qdel: FloatField, cd: np.float64):
         """
         Perform hyperdiffusion damping/filtering.
 
diff --git a/pyfv3/stencils/delnflux.py b/pyfv3/stencils/delnflux.py
index f3c3fb6a..ad0b3300 100644
--- a/pyfv3/stencils/delnflux.py
+++ b/pyfv3/stencils/delnflux.py
@@ -1,8 +1,9 @@
 from typing import Optional
 
 import dace
+import numpy as np
 
-from ndsl import Quantity, QuantityFactory, StencilFactory, orchestrate
+from ndsl import NDSLRuntime, Quantity, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, I_INTERFACE_DIM, J_DIM, J_INTERFACE_DIM, K_DIM
 from ndsl.dsl.gt4py import PARALLEL, computation
 from ndsl.dsl.gt4py import function as gtfunction
@@ -10,15 +11,20 @@
 from ndsl.dsl.stencil import get_stencils_with_varied_bounds
 from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ, FloatFieldK
 from ndsl.grid import DampingCoefficients
-from pyfv3.stencils.copy_corners import corner_copy_x, corner_copy_y
+from pyfv3.stencils.copy_corners import CopyCornersX, CopyCornersY
 
 
 def calc_damp(damp_c: Quantity, da_min: Float, nord: Quantity) -> Quantity:
-    if damp_c.dims != nord.dims or damp_c.data.shape != nord.data.shape:
+    if damp_c.dims != nord.dims or damp_c.shape != nord.shape:
         raise NotImplementedError(
             "current implementation requires damp_c and nord to have identical data shape and dims"
         )
-    data = (damp_c.data * da_min) ** (nord.data + 1)
+    # `da_min` is a 64 bit float and we have to cast the array to deal
+    # with downcasting behavior of array * scalar in numpy
+    # We then reproduce the proper casting so `calc_damp` is a 32-bit float
+    data = np.power(
+        (damp_c[:].astype(np.float64) * da_min), (nord[:] + 1), dtype=np.float64
+    ).astype(Float)
     return Quantity(
         data=data,
         dims=damp_c.dims,
@@ -99,7 +105,7 @@ def fx_calculation(q: FloatField, del6_v: FloatField):
 
 @gtfunction
 def fx_calculation_neg(q: FloatField, del6_v: FloatField):
-    return -del6_v * (q[-1, 0, 0] - q)
+    return del6_v * (q - q[-1, 0, 0])
 
 
 @gtfunction
@@ -109,7 +115,7 @@ def fy_calculation(q: FloatField, del6_u: FloatField):
 
 @gtfunction
 def fy_calculation_neg(q: FloatField, del6_u: FloatField):
-    return -del6_u * (q[0, -1, 0] - q)
+    return del6_u * (q - q[0, -1, 0])
 
 
 def d2_highorder_stencil(
@@ -180,23 +186,11 @@ def diffusive_damp(
     damp: FloatFieldK,
 ):
     with computation(PARALLEL), interval(...):
-        fx = fx + 0.5 * damp * (mass[-1, 0, 0] + mass) * fx2
-        fy = fy + 0.5 * damp * (mass[0, -1, 0] + mass) * fy2
+        fx = fx + (0.5 * damp) * (mass[-1, 0, 0] + mass) * fx2
+        fy = fy + (0.5 * damp) * (mass[0, -1, 0] + mass) * fy2
 
 
-def copy_corners_y_nord(field_to_copy, nord):
-    for k in dace.map[0 : nord.data.shape[0]]:
-        if nord.data[k] > 0:
-            corner_copy_y(field_to_copy[:, :, k])
-
-
-def copy_corners_x_nord(field_to_copy, nord):
-    for k in dace.map[0 : nord.data.shape[0]]:
-        if nord.data[k] > 0:
-            corner_copy_x(field_to_copy[:, :, k])
-
-
-class DelnFlux:
+class DelnFlux(NDSLRuntime):
     """
     Fortran name is deln_flux
     The test class is DelnFlux
@@ -221,10 +215,7 @@ def __init__(
 
         nord and damp_c define the damping coefficient used in DelnFluxNoSG
         """
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-        )
+        super().__init__(stencil_factory)
         self._no_compute = False
         if (damp_c.view[:] <= 1e-4).all():
             self._no_compute = True
@@ -236,21 +227,9 @@ def __init__(
         nk = grid_indexing.domain[2]
         self._origin = grid_indexing.origin_full()
 
-        self._fx2 = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_DIM],
-            units="undefined",
-            dtype=Float,
-        )
-        self._fy2 = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_DIM],
-            units="undefined",
-            dtype=Float,
-        )
-        self._d2 = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_DIM],
-            units="undefined",
-            dtype=Float,
-        )
+        self._fx2 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._fy2 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._d2 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
 
         self._add_diffusive_stencil = stencil_factory.from_dims_halo(
             func=add_diffusive_component,
@@ -265,7 +244,11 @@ def __init__(
         )
 
         self.delnflux_nosg = DelnFluxNoSG(
-            stencil_factory, damping_coefficients, rarea, nord_col, nk=nk
+            stencil_factory,
+            damping_coefficients,
+            rarea,
+            nord_col,
+            nk=nk,
         )
 
     def __call__(
@@ -279,11 +262,11 @@ def __call__(
         """
         Del-n damping for fluxes, where n = 2 * nord + 2
         Args:
-            q: Field for which to calculate damped fluxes (in)
-            fx: x-flux on A-grid (inout)
-            fy: y-flux on A-grid (inout)
-            d2: A damped copy of the q field (in)
-            mass: Mass to weight the diffusive flux by (in)
+            q (in): Field for which to calculate damped fluxes
+            fx (inout): x-flux on A-grid
+            fy (inout): y-flux on A-grid
+            d2 (in): A damped copy of the q field
+            mass (in): Mass to weight the diffusive flux by
         """
         if self._no_compute:
             return fx, fy
@@ -313,7 +296,7 @@ def __call__(
         return fx, fy
 
 
-class DelnFluxNoSG:
+class DelnFluxNoSG(NDSLRuntime):
     """
     This contains the mechanics of del6_vt and some of deln_flux from
     the Fortran code, since they are very similar routines. The test class
@@ -338,15 +321,12 @@ def __init__(
         nord = 1:   del-4
         nord = 2:   del-6
         """
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-        )
+        super().__init__(stencil_factory)
         grid_indexing = stencil_factory.grid_indexing
         self._del6_u = damping_coefficients.del6_u
         self._del6_v = damping_coefficients.del6_v
         self._rarea = rarea
-        nord.data[:] = nord.data[:].round().astype(int)
+        nord[:] = nord[:].round().astype(int)
         self._nmax = int(max(nord.view[:]))
         if self._nmax > 3:
             raise ValueError("nord must be less than 3")
@@ -438,6 +418,9 @@ def __init__(
             domain=(f1_nx - 1, f1_ny + 1, nk),
         )
 
+        self.copy_corners_x = CopyCornersX(stencil_factory)
+        self.copy_corners_y = CopyCornersY(stencil_factory)
+
     def __call__(self, q, fx2, fy2, damp_c, d2, mass=None):
         """
         Computes flux fields which would apply del-n damping to q,
@@ -456,17 +439,36 @@ def __call__(self, q, fx2, fy2, damp_c, d2, mass=None):
         """
 
         if mass is None:
-            self._d2_damp(q=q, d2=d2, damp=damp_c, nord=self._nord)
+            self._d2_damp(
+                q=q,
+                d2=d2,
+                damp=damp_c,
+                nord=self._nord,
+            )
         else:
-            self._copy_stencil_interval(q_in=q, q_out=d2, nord=self._nord)
+            self._copy_stencil_interval(
+                q_in=q,
+                q_out=d2,
+                nord=self._nord,
+            )
 
-        copy_corners_x_nord(d2.data, self._nord)
+        self.copy_corners_x.nord(d2.data, self._nord)
 
-        self._fx_calc_stencil(q=d2, del6_v=self._del6_v, fx=fx2, nord=self._nord)
+        self._fx_calc_stencil(
+            q=d2,
+            del6_v=self._del6_v,
+            fx=fx2,
+            nord=self._nord,
+        )
 
-        copy_corners_y_nord(d2.data, self._nord)
+        self.copy_corners_y.nord(d2.data, self._nord)
 
-        self._fy_calc_stencil(q=d2, del6_u=self._del6_u, fy=fy2, nord=self._nord)
+        self._fy_calc_stencil(
+            q=d2,
+            del6_u=self._del6_u,
+            fy=fy2,
+            nord=self._nord,
+        )
 
         # Force unroll of the loop because list of object do not parse
         # when unrolled
@@ -481,14 +483,22 @@ def __call__(self, q, fx2, fy2, damp_c, d2, mass=None):
                 current_nord=n,
             )
 
-            copy_corners_x_nord(d2.data, self._nord)
+            self.copy_corners_x.nord(d2.data, self._nord)
 
             self._column_conditional_fx_calculation[n](
-                q=d2, del6_v=self._del6_v, fx=fx2, nord=self._nord, current_nord=n
+                q=d2,
+                del6_v=self._del6_v,
+                fx=fx2,
+                nord=self._nord,
+                current_nord=n,
             )
 
-            copy_corners_y_nord(d2.data, self._nord)
+            self.copy_corners_y.nord(d2.data, self._nord)
 
             self._column_conditional_fy_calculation[n](
-                q=d2, del6_u=self._del6_u, fy=fy2, nord=self._nord, current_nord=n
+                q=d2,
+                del6_u=self._del6_u,
+                fy=fy2,
+                nord=self._nord,
+                current_nord=n,
             )
diff --git a/pyfv3/stencils/divergence_damping.py b/pyfv3/stencils/divergence_damping.py
index 4b1800ce..c92a372b 100644
--- a/pyfv3/stencils/divergence_damping.py
+++ b/pyfv3/stencils/divergence_damping.py
@@ -3,10 +3,10 @@
 
 import ndsl.stencils.basic_operations as basic
 import ndsl.stencils.corners as corners
-from ndsl import Quantity, QuantityFactory, StencilFactory
+from ndsl import NDSLRuntime, Quantity, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, I_INTERFACE_DIM, J_DIM, J_INTERFACE_DIM, K_DIM
-from ndsl.dsl.dace.orchestration import dace_inhibitor, orchestrate
-from ndsl.dsl.gt4py import PARALLEL, computation
+from ndsl.dsl.dace.orchestration import dace_inhibitor
+from ndsl.dsl.gt4py import PARALLEL, computation, float32
 from ndsl.dsl.gt4py import function as gtfunction
 from ndsl.dsl.gt4py import horizontal, interval, region, sqrt
 from ndsl.dsl.stencil import get_stencils_with_varied_bounds
@@ -21,8 +21,13 @@
 
 @gtfunction
 def damp_tmp(q, da_min_c, d2_bg, dddmp):
-    mintmp = min(0.2, dddmp * abs(q))
-    damp = da_min_c * max(d2_bg, mintmp)
+    damp: float32 = da_min_c * max(d2_bg, min(0.2, dddmp * abs(q)))
+    return damp
+
+
+@gtfunction
+def damp_tmp2(q, da_min_c, d2_bg, dddmp):
+    damp: float32 = da_min_c * max(d2_bg, min(0.2, dddmp * q))
     return damp
 
 
@@ -49,6 +54,21 @@ def compute_u_contra_dyc(
         sin_sg2 (in):
         sin_sg4 (in):
         u_contra_dyc (out): contravariant u-wind on d-grid
+
+    Porting Notes
+    * The compute_u_contra_dyc and compute_v_contra_dxc functions have
+      the dyc and dxc values incorporated earlier in the calcuation rather than later,
+      and this enables the u_contra_dyc and v_contra_dxc values
+      to match with the Fortran.
+      As a result, the delpc computation matches the Fortran value of delpc.
+
+      Ex : Previous implementation of compute_u_contra_dyc
+      =================================================================
+           u_contra = contravariant(u, vc_from_va, cosa_v, sina_v)
+           with horizontal(region[:, j_start], region[:, j_end + 1]):
+               u_contra = u * sin_sg4[0, -1] if vc > 0 else u * sin_sg2
+           u_contra_dyc = u_contra * dyc
+      =================================================================
     """
     from __externals__ import j_end, j_start
 
@@ -56,10 +76,10 @@ def compute_u_contra_dyc(
         # TODO: why does vc_from_va sometimes have different sign than vc?
         vc_from_va = 0.5 * (va[0, -1, 0] + va)
         # TODO: why do we use vc_from_va and not just vc?
-        u_contra = contravariant(u, vc_from_va, cosa_v, sina_v)
+        u_contra_dyc = contravariant(u, vc_from_va, cosa_v, dyc)
+        u_contra_dyc = u_contra_dyc * sina_v
         with horizontal(region[:, j_start], region[:, j_end + 1]):
-            u_contra = u * sin_sg4[0, -1] if vc > 0 else u * sin_sg2
-        u_contra_dyc = u_contra * dyc
+            u_contra_dyc = u * dyc * sin_sg4[0, -1] if vc > 0 else u * dyc * sin_sg2
 
 
 def compute_v_contra_dxc(
@@ -84,6 +104,20 @@ def compute_v_contra_dxc(
         uc (in):
         sin_sg3 (in):
         sin_sg1 (in):
+
+    Porting Notes
+    * The compute_u_contra_dyc and compute_v_contra_dxc functions
+      have the dyc and dxc values incorporated earlier in the calcuation
+      rather than later, and this enables the u_contra_dyc and v_contra_dxc
+      values to match with the Fortran.  As a result, the delpc computation
+      matches the Fortran value of delpc.
+
+      Ex : Previous implementation of compute_v_contra_dxc
+        =================================================================
+        v_contra = contravariant(v, uc_from_ua, cosa_u, sina_u)
+        with horizontal(region[i_start, :], region[i_end + 1, :]):
+            v_contra = v * sin_sg3[-1, 0] if uc > 0 else v * sin_sg1
+        v_contra_dxc = v_contra * dxc
     """
     from __externals__ import i_end, i_start
 
@@ -91,10 +125,10 @@ def compute_v_contra_dxc(
         # TODO: why does uc_from_ua sometimes have different sign than uc?
         uc_from_ua = 0.5 * (ua[-1, 0, 0] + ua)
         # TODO: why do we use uc_from_ua and not just uc?
-        v_contra = contravariant(v, uc_from_ua, cosa_u, sina_u)
+        v_contra_dxc = contravariant(v, uc_from_ua, cosa_u, dxc)
+        v_contra_dxc = v_contra_dxc * sina_u
         with horizontal(region[i_start, :], region[i_end + 1, :]):
-            v_contra = v * sin_sg3[-1, 0] if uc > 0 else v * sin_sg1
-        v_contra_dxc = v_contra * dxc
+            v_contra_dxc = v * dxc * sin_sg3[-1, 0] if uc > 0 else v * dxc * sin_sg1
 
 
 def delpc_computation(
@@ -139,7 +173,7 @@ def damping(
     vort: FloatField,
     ke: FloatField,
     d2_bg: FloatFieldK,
-    da_min_c: Float,
+    da_min_c: np.float64,
     dddmp: Float,
     dt: Float,
 ):
@@ -163,7 +197,7 @@ def damping_nord_highorder_stencil(
     delpc: FloatField,
     divg_d: FloatField,
     d2_bg: FloatFieldK,
-    da_min_c: Float,
+    da_min_c: np.float64,
     dddmp: Float,
     dd8: Float,
 ):
@@ -179,7 +213,7 @@ def damping_nord_highorder_stencil(
     """
     # TODO: propagate variable renaming into this routine
     with computation(PARALLEL), interval(...):
-        damp = damp_tmp(vort, da_min_c, d2_bg, dddmp)
+        damp = damp_tmp2(vort, da_min_c, d2_bg, dddmp)
         vort = damp * delpc + dd8 * divg_d
         ke = ke + vort
 
@@ -247,7 +281,7 @@ def smagorinsky_diffusion_approx(delpc: FloatField, vort: FloatField, absdt: Flo
         absdt (in): abs(dt)
     """
     with computation(PARALLEL), interval(...):
-        vort = absdt * (delpc**2.0 + vort**2.0) ** 0.5
+        vort = absdt * sqrt(delpc**2 + vort**2)
 
 
 def smag_corner(
@@ -294,7 +328,7 @@ def smag_corner(
         smag_c = dt * sqrt(shear**2 + smag_c_t**2)
 
 
-class DivergenceDamping:
+class DivergenceDamping(NDSLRuntime):
     """
     A large section in Fortran's d_sw that applies divergence damping
     """
@@ -314,10 +348,8 @@ def __init__(
         nord_col: Quantity,
         d2_bg: FloatFieldK,
     ):
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-        )
+        super().__init__(stencil_factory)
+
         self.grid_indexing = stencil_factory.grid_indexing
         if nested:
             raise NotImplementedError("Divergence Damping: nested not implemented.")
@@ -463,7 +495,6 @@ def __init__(
             func=corners.fill_corners_dgrid_defn,
             compute_dims=[I_INTERFACE_DIM, J_INTERFACE_DIM, K_DIM],
             compute_halos=(self.grid_indexing.n_halo, self.grid_indexing.n_halo),
-            skip_passes=("UnreachableStmtPruning",),
         )
 
         self._redo_divg_d_stencils = get_stencils_with_varied_bounds(
@@ -537,12 +568,12 @@ def __init__(
     # odd and adds a lot of boilerplate throughout the model code.
 
     @dace_inhibitor
-    def _get_da_min_c(self) -> Float:
-        return Float(self._damping_coefficients.da_min_c)
+    def _get_da_min_c(self) -> np.float64:
+        return self._damping_coefficients.da_min_c
 
     @dace_inhibitor
-    def _get_da_min(self) -> Float:
-        return Float(self._damping_coefficients.da_min)
+    def _get_da_min(self) -> np.float64:
+        return self._damping_coefficients.da_min
 
     def __call__(
         self,
@@ -630,13 +661,12 @@ def __call__(
                 self.v_contra_dxc,
             )
 
-            da_min_c: Float = self._get_da_min_c()
             self._damping(
                 delpc,
                 damped_rel_vort_bgrid,
                 ke,
                 self._d2_bg_column,
-                da_min_c,
+                self._damping_coefficients.da_min_c,
                 self._dddmp,
                 dt,
             )
@@ -695,14 +725,17 @@ def __call__(
                     abs(dt),
                 )
 
-        da_min: Float = self._get_da_min()
         if self._stretched_grid:
             # reference https://github.com/NOAA-GFDL/GFDL_atmos_cubed_sphere/blob/main/model/sw_core.F90#L1422 # noqa: E501
-            dd8 = da_min * np.power(self._d4_bg, (self._nonzero_nord + 1), dtype=Float)
+            dd8 = Float(
+                self._damping_coefficients.da_min
+                * np.power(self._d4_bg, (self._nonzero_nord + 1))
+            )
         else:
             dd8 = np.power(
-                (da_min_c * self._d4_bg), (self._nonzero_nord + 1), dtype=Float
-            )
+                (self._damping_coefficients.da_min_c * self._d4_bg),
+                (self._nonzero_nord + 1),
+            ).astype(Float)
 
         self._damping_nord_highorder_stencil(
             damped_rel_vort_bgrid,
@@ -710,7 +743,7 @@ def __call__(
             delpc,
             divg_d,
             self._d2_bg_column,
-            da_min_c,
+            self._damping_coefficients.da_min_c,
             self._dddmp,
             dd8,
         )
diff --git a/pyfv3/stencils/dyn_core.py b/pyfv3/stencils/dyn_core.py
index ca0ea4ad..44970381 100644
--- a/pyfv3/stencils/dyn_core.py
+++ b/pyfv3/stencils/dyn_core.py
@@ -1,7 +1,7 @@
 from collections.abc import Mapping
 
+import dace
 import numpy as np
-from dace.frontend.python.interface import nounroll as dace_nounroll
 
 import ndsl.constants as constants
 import ndsl.stencils.basic_operations as basic
@@ -14,13 +14,12 @@
 import pyfv3.stencils.updatedzd as updatedzd
 from ndsl import (
     GridIndexing,
+    NDSLRuntime,
     Quantity,
     QuantityFactory,
     StencilFactory,
     WrappedHaloUpdater,
-    orchestrate,
 )
-from ndsl.checkpointer import NullCheckpointer
 from ndsl.constants import (
     I_DIM,
     I_INTERFACE_DIM,
@@ -29,7 +28,6 @@
     K_DIM,
     K_INTERFACE_DIM,
 )
-from ndsl.dsl.dace.orchestration import dace_inhibitor
 from ndsl.dsl.gt4py import (
     BACKWARD,
     FORWARD,
@@ -39,9 +37,9 @@
     interval,
     region,
 )
-from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
+from ndsl.dsl.typing import Float, FloatField, FloatField64, FloatFieldIJ
 from ndsl.grid import DampingCoefficients, GridData
-from ndsl.typing import Checkpointer, Communicator
+from ndsl.typing import Communicator
 from pyfv3._config import AcousticDynamicsConfig
 from pyfv3.dycore_state import DycoreState
 from pyfv3.stencils.c_sw import CGridShallowWaterDynamics
@@ -60,10 +58,10 @@
 
 
 def zero_data(
-    mfxd: FloatField,
-    mfyd: FloatField,
-    cxd: FloatField,
-    cyd: FloatField,
+    mfxd: FloatField64,
+    mfyd: FloatField64,
+    cxd: FloatField64,
+    cyd: FloatField64,
     heat_source: FloatField,
     diss_estd: FloatField,
     first_timestep: bool,
@@ -128,11 +126,9 @@ def compute_geopotential(zh: FloatField, gz: FloatField):
         gz = zh * constants.GRAV
 
 
-def p_grad_c_stencil(
+def p_grad_c_stencil_x(
     rdxc: FloatFieldIJ,
-    rdyc: FloatFieldIJ,
     uc: FloatField,
-    vc: FloatField,
     delpc: FloatField,
     pkc: FloatField,
     gz: FloatField,
@@ -149,11 +145,8 @@ def p_grad_c_stencil(
 
     Args:
         rdxc (in):
-        rdyc (in):
         uc (inout): x-velocity on the C-grid, has been updated due to advection
             but not yet due to pressure gradient force
-        vc (inout): y-velocity on the C-grid, has been updated due to advection
-            but not yet due to pressure gradient force
         delpc (in): vertical delta in pressure
         pkc (in): pressure if non-hydrostatic,
             (edge pressure)**(moist kappa) if hydrostatic
@@ -176,6 +169,26 @@ def p_grad_c_stencil(
             + (gz[-1, 0, 0] - gz[0, 0, 1]) * (pkc[-1, 0, 1] - pkc)
         )
 
+
+def p_grad_c_stencil_y(
+    rdyc: FloatFieldIJ,
+    vc: FloatField,
+    delpc: FloatField,
+    pkc: FloatField,
+    gz: FloatField,
+    dt2: Float,
+):
+    """
+    See p_grad_c_stencil_y
+    """
+    from __externals__ import hydrostatic
+
+    with computation(PARALLEL), interval(...):
+        if __INLINED(hydrostatic):
+            wk = pkc[0, 0, 1] - pkc
+        else:
+            wk = delpc
+        # wk is pressure gradient
         vc = vc + dt2 * rdyc / (wk[0, -1, 0] + wk) * (
             (gz[0, -1, 1] - gz) * (pkc[0, 0, 1] - pkc[0, -1, 0])
             + (gz[0, -1, 0] - gz[0, 0, 1]) * (pkc[0, -1, 1] - pkc)
@@ -204,7 +217,15 @@ def dyncore_temporaries(
     quantity_factory: QuantityFactory,
 ) -> Mapping[str, Quantity]:
     temporaries: dict[str, Quantity] = {}
-    for name in ["ut", "vt", "gz", "zh", "pem", "pkc", "pk3", "heat_source", "cappa"]:
+    for name in [
+        "ut",
+        "vt",
+        "pem",
+        "pk3",
+        "heat_source",
+        "cappa",
+        "dpx",
+    ]:
         # TODO: the dimensions of ut and vt may not be correct,
         #       because they are not used. double-check and correct as needed.
         temporaries[name] = quantity_factory.zeros(
@@ -243,7 +264,7 @@ def dyncore_temporaries(
     return temporaries
 
 
-class AcousticDynamics:
+class AcousticDynamics(NDSLRuntime):
     """
     Fortran name is dyn_core
     Performs the Lagrangian acoustic dynamics described by Lin 2004
@@ -389,9 +410,7 @@ def __init__(
         stretched_grid,
         config: AcousticDynamicsConfig,
         phis: FloatFieldIJ,
-        wsd: FloatFieldIJ,
         state,  # [DaCe] hack to get around quantity as parameters for halo updates
-        checkpointer: Checkpointer | None = None,
     ):
         """
         Args:
@@ -406,55 +425,28 @@ def __init__(
             config: configuration settings
             pfull: atmospheric Eulerian grid reference pressure (Pa)
             phis: surface geopotential height
-            checkpointer: if given, used to perform operations on model data
-                at specific points in model execution, such as testing against
-                reference data
         """
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-            dace_compiletime_args=["state"],
-        )
+        super().__init__(stencil_factory)
 
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-            method_to_orchestrate="_checkpoint_csw",
-            dace_compiletime_args=["state", "tag"],
-        )
-
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-            method_to_orchestrate="_checkpoint_dsw_in",
-            dace_compiletime_args=["state", "tag"],
-        )
-
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-            method_to_orchestrate="_checkpoint_dsw_out",
-            dace_compiletime_args=["state", "tag"],
-        )
-
-        self.call_checkpointer = checkpointer is not None
-        if checkpointer is None:
-            self.checkpointer: Checkpointer = NullCheckpointer()
-        else:
-            self.checkpointer = checkpointer
         grid_indexing = stencil_factory.grid_indexing
         self.config = config
+        self.hydrostatic = config.hydrostatic
         if config.d_ext != 0:
             raise RuntimeError("Acoustics (dyn_core): d_ext != 0 is not implemented")
         if config.beta != 0:
-            raise RuntimeError("Acoustics (dyn_core): beta != 0 is not implemented")
+            raise RuntimeError(
+                "Acoustics (dyn_core): beta != 0 is not implemented (split_p_grad, etc.)"
+            )
+        if config.beta < -0.1:
+            raise RuntimeError(
+                "Acoustics (dyn_core): beta < 0.1 is not implemented (one_grad_p, etc.)"
+            )
         if config.use_logp:
             raise RuntimeError("Acoustics (dyn_core): use_logp=True is not implemented")
         self._da_min = damping_coefficients.da_min
         self.grid_data = grid_data
         self._ptop = grid_data.ptop
         self._pfull = grid_data.p
-        self._wsd = wsd
         self._nk_heat_dissipation = get_nk_heat_dissipation(
             config.d_grid_shallow_water,
             npz=grid_indexing.domain[2],
@@ -470,25 +462,9 @@ def __init__(
         )
         self._akap = Float(constants.KAPPA)
 
-        temporaries = dyncore_temporaries(quantity_factory)
-        self._heat_source = temporaries["heat_source"]
-        self._divgd = temporaries["divgd"]
-        self._gz = temporaries["gz"]
-        self._pkc = temporaries["pkc"]
-        self._zh = temporaries["zh"]
-        self.cappa = temporaries["cappa"]
-        self._ut = temporaries["ut"]
-        self._vt = temporaries["vt"]
-        self._pem = temporaries["pem"]
-        self._pk3 = temporaries["pk3"]
-        self._crx = temporaries["crx"]
-        self._cry = temporaries["cry"]
-        self._xfx = temporaries["xfx"]
-        self._yfx = temporaries["yfx"]
-        self._ws3 = temporaries["ws3"]
-
-        if not config.hydrostatic:
-            self._pk3.data[:] = HUGE_R
+        # Locals
+        self._make_locals(quantity_factory)
+        self._make_persistent_temporaries(quantity_factory)
 
         column_namelist = d_sw.get_column_namelist(
             config.d_grid_shallow_water, quantity_factory=quantity_factory
@@ -502,9 +478,9 @@ def __init__(
                 units="m",
                 dtype=Float,
             )
-            self._zs.data[:] = self._zs.np.asarray(
-                phis.data / constants.GRAV, dtype=self._zs.data.dtype
-            )
+            # Fortran reads in _all_ data - including potentially
+            # unitialized (HUGE_R) edges and corner values!
+            self._zs[:] = phis[:] * constants.RGRAV
 
             self.update_height_on_d_grid = updatedzd.UpdateHeightOnDGrid(
                 stencil_factory,
@@ -514,6 +490,7 @@ def __init__(
                 grid_type=grid_type,
                 hord_tm=config.hord_tm,
                 column_namelist=column_namelist,
+                dz_min=Float(config.dz_min),
             )
             self.vertical_solver = NonhydrostaticVerticalSolver(
                 stencil_factory,
@@ -566,10 +543,16 @@ def __init__(
             )
         )
 
-        self._p_grad_c = stencil_factory.from_origin_domain(
-            p_grad_c_stencil,
+        self._p_grad_c_x = stencil_factory.from_origin_domain(
+            p_grad_c_stencil_x,
             origin=grid_indexing.origin_compute(),
-            domain=grid_indexing.domain_compute(add=(1, 1, 0)),
+            domain=grid_indexing.domain_compute(add=(1, 0, 0)),
+            externals={"hydrostatic": config.hydrostatic},
+        )
+        self._p_grad_c_y = stencil_factory.from_origin_domain(
+            p_grad_c_stencil_y,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(0, 1, 0)),
             externals={"hydrostatic": config.hydrostatic},
         )
 
@@ -580,13 +563,14 @@ def __init__(
                 area=grid_data.area,
                 dp_ref=grid_data.dp_ref,
                 grid_type=config.grid_type,
+                dz_min=Float(config.dz_min),
             )
         )
 
         self._zero_data = stencil_factory.from_origin_domain(
             zero_data,
             origin=grid_indexing.origin_full(),
-            domain=grid_indexing.domain_full(),
+            domain=grid_indexing.domain_full(add=(1, 1, 0)),
         )
         ax_offsets_pe = grid_indexing.axis_offsets(
             grid_indexing.origin_full(),
@@ -640,94 +624,62 @@ def __init__(
             quantity_factory,
             state,
             cappa=self.cappa,
-            gz=self._gz,
-            zh=self._zh,
-            divgd=self._divgd,
-            heat_source=self._heat_source,
-            pkc=self._pkc,
+            gz=self.gz,
+            zh=self.zh,
+            divgd=self.divgd,
+            heat_source=self.heat_source,
+            pkc=self.pkc,
         )
 
-    # See divergence_damping.py, _get_da_min for explanation of this function
-    @dace_inhibitor
-    def _get_da_min(self) -> float:
-        return self._da_min
-
-    def _checkpoint_csw(self, state: DycoreState, tag: str):
-        if self.call_checkpointer:
-            self.checkpointer(
-                f"C_SW-{tag}",
-                delpd=state.delp,
-                ptd=state.pt,
-                ud=state.u,
-                vd=state.v,
-                wd=state.w,
-                ucd=state.uc,
-                vcd=state.vc,
-                uad=state.ua,
-                vad=state.va,
-                utd=self._ut,
-                vtd=self._vt,
-                divgdd=self._divgd,
-            )
+    def _make_persistent_temporaries(
+        self,
+        quantity_factory: QuantityFactory,
+    ):
+        """Define should memory that should be Local - but due to un-covered
+        use case for orchestration (halo exchange, etc.) they are kept persistent."""
 
-    def _checkpoint_dsw_in(self, state: DycoreState):
-        if self.call_checkpointer:
-            self.checkpointer(
-                "D_SW-In",
-                ucd=state.uc,
-                vcd=state.vc,
-                wd=state.w,
-                # delpc is a temporary and not a variable in D_SW savepoint
-                delpcd=self._vt,
-                delpd=state.delp,
-                ud=state.u,
-                vd=state.v,
-                ptd=state.pt,
-                uad=state.ua,
-                vad=state.va,
-                zhd=self._zh,
-                divgdd=self._divgd,
-                xfxd=self._xfx,
-                yfxd=self._yfx,
-                mfxd=state.mfxd,
-                mfyd=state.mfyd,
-            )
+        self.heat_source = quantity_factory.zeros([I_DIM, J_DIM, K_DIM], "")
+        self.cappa = quantity_factory.zeros([I_DIM, J_DIM, K_DIM], "")
 
-    def _checkpoint_dsw_out(self, state: DycoreState):
-        if self.call_checkpointer:
-            self.checkpointer(
-                "D_SW-Out",
-                ucd=state.uc,
-                vcd=state.vc,
-                wd=state.w,
-                delpcd=self._vt,
-                delpd=state.delp,
-                ud=state.u,
-                vd=state.v,
-                ptd=state.pt,
-                uad=state.ua,
-                vad=state.va,
-                divgdd=self._divgd,
-                xfxd=self._xfx,
-                yfxd=self._yfx,
-                mfxd=state.mfxd,
-                mfyd=state.mfyd,
-            )
+        self.gz = quantity_factory.zeros([I_DIM, J_DIM, K_INTERFACE_DIM], "")
+        self.pkc = quantity_factory.zeros([I_DIM, J_DIM, K_INTERFACE_DIM], "")
+        self.zh = quantity_factory.zeros([I_DIM, J_DIM, K_INTERFACE_DIM], "")
 
-    # TODO: fix me - we shouldn't need a function here, Dace is fudging the types
-    # See https://github.com/GEOS-ESM/pace/issues/9
-    @dace_inhibitor
-    def dt_acoustic_substep(self, timestep: Float) -> Float:
-        return timestep / self.config.n_split
+        self.divgd = quantity_factory.zeros(
+            [I_INTERFACE_DIM, J_INTERFACE_DIM, K_DIM], ""
+        )
 
-    # TODO: Same as above
-    @dace_inhibitor
-    def dt2(self, dt_acoustic_substep: Float) -> Float:
-        return 0.5 * dt_acoustic_substep
+    def _make_locals(
+        self,
+        quantity_factory: QuantityFactory,
+    ):
+        """Make Local accssible on `self`"""
+
+        # TODO: the dimensions of ut and vt may not be correct,
+        #       because they are not used. double-check and correct as needed.
+        self._ut = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._vt = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._pem = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._pk3 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._dpx = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+
+        self._ws3 = self.make_local(quantity_factory, [I_DIM, J_DIM])
+
+        self._crx = self.make_local(quantity_factory, [I_INTERFACE_DIM, J_DIM, K_DIM])
+        self._xfx = self.make_local(quantity_factory, [I_INTERFACE_DIM, J_DIM, K_DIM])
+
+        self._cry = self.make_local(quantity_factory, [I_DIM, J_INTERFACE_DIM, K_DIM])
+        self._yfx = self.make_local(quantity_factory, [I_DIM, J_INTERFACE_DIM, K_DIM])
 
     def __call__(
         self,
-        state: DycoreState,
+        state: dace.compiletime,  # ToDo: remove when DycoreState becomes a ndsl.State
+        mfxd,
+        mfyd,
+        cxd,
+        cyd,
+        dpx,
+        wsd,
         timestep: Float,  # time to step forward by in seconds
         n_map=1,  # [DaCe] replaces state.n_map
     ):
@@ -736,8 +688,8 @@ def __call__(
         # akap, ptop, n_map, comm):
         end_step = n_map == self.config.k_split
         # dt = state.mdt / self.config.n_split
-        dt_acoustic_substep: Float = self.dt_acoustic_substep(timestep)
-        dt2: Float = self.dt2(dt_acoustic_substep)
+        dt_acoustic_substep = Float(timestep / self.config.n_split)
+        dt2 = Float(0.5) * dt_acoustic_substep
         n_split = self.config.n_split
         # NOTE: In Fortran model the halo update starts happens in fv_dynamics, not here
         self._halo_updaters.q_con__cappa.start()
@@ -746,20 +698,24 @@ def __call__(
         self._halo_updaters.q_con__cappa.wait()
 
         self._zero_data(
-            state.mfxd,
-            state.mfyd,
-            state.cxd,
-            state.cyd,
-            self._heat_source,
+            mfxd,
+            mfyd,
+            cxd,
+            cyd,
+            self.heat_source,
             state.diss_estd,
             n_map == 1,
         )
 
+        if not self.hydrostatic:
+            self._pk3[:] = HUGE_R
+        self.gz[:] = HUGE_R
+
         # "acoustic" loop
         # called this because its timestep is usually limited by horizontal sound-wave
         # processes. Note this is often not the limiting factor near the poles, where
         # the speed of the polar night jets can exceed two-thirds of the speed of sound.
-        for it in dace_nounroll(range(n_split)):
+        for it in range(n_split):
             # the Lagrangian dynamics have two parts. First we advance the C-grid winds
             # by half a time step (c_sw). Then the C-grid winds are used to define
             # advective fluxes to advance the D-grid prognostic fields a full time step
@@ -779,7 +735,7 @@ def __call__(
                     self._gz_from_surface_height_and_thickness(
                         self._zs,
                         state.delz,
-                        self._gz,
+                        self.gz,
                     )
                     self._halo_updaters.gz.start()
             if it == 0:
@@ -798,7 +754,6 @@ def __call__(
                 self._halo_updaters.w.wait()
 
             # compute the c-grid winds at t + 1/2 timestep
-            self._checkpoint_csw(state, tag="In")
             self.cgrid_shallow_water_lagrangian_dynamics(
                 state.delp,
                 state.pt,
@@ -811,11 +766,10 @@ def __call__(
                 state.va,
                 self._ut,
                 self._vt,
-                self._divgd,
+                self.divgd,
                 state.omga,
                 dt2,
             )
-            self._checkpoint_csw(state, tag="Out")
 
             # TODO: Computing the pressure gradient outside of C_SW was originally done
             # so that we could transpose into a vertical-first memory ordering for the
@@ -828,17 +782,22 @@ def __call__(
                 if it == 0:
                     self._halo_updaters.gz.wait()
                     self._copy_stencil(
-                        self._gz,
-                        self._zh,
+                        self.gz,
+                        self.zh,
                     )
                 else:
                     self._copy_stencil(
-                        self._zh,
-                        self._gz,
+                        self.zh,
+                        self.gz,
                     )
             if not self.config.hydrostatic:
                 self.update_geopotential_height_on_c_grid(
-                    self._zs, self._ut, self._vt, self._gz, self._ws3, dt2
+                    zs=self._zs,
+                    ut=self._ut,
+                    vt=self._vt,
+                    gz=self.gz,
+                    ws=self._ws3,
+                    dt=dt2,
                 )
                 # TODO (floriand): Due to DaCe VRAM pooling creating a memory
                 # leak with the usage pattern of those two fields
@@ -848,63 +807,69 @@ def __call__(
                 # DaCe has already a fix on their side and it awaits release
                 # issue
                 self.vertical_solver_cgrid(
-                    dt2,
-                    self.cappa,
-                    self._ptop,
-                    state.phis,
-                    self._ws3,
-                    self.cgrid_shallow_water_lagrangian_dynamics.ptc,
-                    state.q_con,
-                    self.cgrid_shallow_water_lagrangian_dynamics.delpc,
-                    self._gz,
-                    self._pkc,
-                    state.omga,
+                    dt2=dt2,
+                    cappa=self.cappa,
+                    ptop=self._ptop,
+                    hs=state.phis,
+                    ws=self._ws3,
+                    ptc=self.cgrid_shallow_water_lagrangian_dynamics.ptc,
+                    q_con=state.q_con,
+                    delpc=self.cgrid_shallow_water_lagrangian_dynamics.delpc,
+                    gz=self.gz,
+                    pef=self.pkc,
+                    w3=state.omga,
                 )
 
-            self._p_grad_c(
-                self.grid_data.rdxc,
-                self.grid_data.rdyc,
-                state.uc,
-                state.vc,
-                self.cgrid_shallow_water_lagrangian_dynamics.delpc,
-                self._pkc,
-                self._gz,
-                dt2,
+            self._p_grad_c_x(
+                rdxc=self.grid_data.rdxc,
+                uc=state.uc,
+                delpc=self.cgrid_shallow_water_lagrangian_dynamics.delpc,
+                pkc=self.pkc,
+                gz=self.gz,
+                dt2=dt2,
+            )
+            self._p_grad_c_y(
+                rdyc=self.grid_data.rdyc,
+                vc=state.vc,
+                delpc=self.cgrid_shallow_water_lagrangian_dynamics.delpc,
+                pkc=self.pkc,
+                gz=self.gz,
+                dt2=dt2,
             )
+
             self._halo_updaters.uc__vc.start()
             if self.config.nord > 0:
                 self._halo_updaters.divgd.wait()
             self._halo_updaters.uc__vc.wait()
             # use the computed c-grid winds to evolve the d-grid winds forward
             # by 1 timestep
-            self._checkpoint_dsw_in(state)
             self.dgrid_shallow_water_lagrangian_dynamics(
-                self._vt,
-                state.delp,
-                state.pt,
-                state.u,
-                state.v,
-                state.w,
-                state.uc,
-                state.vc,
-                state.ua,
-                state.va,
-                self._divgd,
-                state.mfxd,
-                state.mfyd,
-                state.cxd,
-                state.cyd,
-                self._crx,
-                self._cry,
-                self._xfx,
-                self._yfx,
-                state.q_con,
-                self._zh,
-                self._heat_source,
-                state.diss_estd,
-                dt_acoustic_substep,
+                delpc=self._vt,
+                delp=state.delp,
+                pt=state.pt,
+                u=state.u,
+                v=state.v,
+                w=state.w,
+                uc=state.uc,
+                vc=state.vc,
+                ua=state.ua,
+                va=state.va,
+                divgd=self.divgd,
+                mfx=mfxd,
+                mfy=mfyd,
+                cx=cxd,
+                cy=cyd,
+                dpx=dpx,
+                crx=self._crx,
+                cry=self._cry,
+                xfx=self._xfx,
+                yfx=self._yfx,
+                q_con=state.q_con,
+                zh=self.zh,
+                heat_source=self.heat_source,
+                diss_est=state.diss_estd,
+                dt=dt_acoustic_substep,
             )
-            self._checkpoint_dsw_out(state)
             # note that uc and vc are not needed at all past this point.
             # they will be re-computed from scratch on the next acoustic timestep.
 
@@ -920,32 +885,32 @@ def __call__(
                 # without explicit arg names, numpy does not run
                 self.update_height_on_d_grid(
                     surface_height=self._zs,
-                    height=self._zh,
+                    height=self.zh,
                     courant_number_x=self._crx,
                     courant_number_y=self._cry,
                     x_area_flux=self._xfx,
                     y_area_flux=self._yfx,
-                    ws=self._wsd,
+                    ws=wsd,
                     dt=dt_acoustic_substep,
                 )
                 self.vertical_solver(
-                    remap_step,
-                    dt_acoustic_substep,
-                    self.cappa,
-                    self._ptop,
-                    self._zs,
-                    self._wsd,
-                    state.delz,
-                    state.q_con,
-                    state.delp,
-                    state.pt,
-                    self._zh,
-                    state.pe,
-                    self._pkc,
-                    self._pk3,
-                    state.pk,
-                    state.peln,
-                    state.w,
+                    last_call=remap_step,
+                    dt=dt_acoustic_substep,
+                    cappa=self.cappa,
+                    ptop=self._ptop,
+                    zs=self._zs,
+                    ws=wsd,
+                    delz=state.delz,
+                    q_con=state.q_con,
+                    delp=state.delp,
+                    pt=state.pt,
+                    zh=self.zh,
+                    p=state.pe,
+                    ppe=self.pkc,
+                    pk3=self._pk3,
+                    pk=state.pk,
+                    log_p_interface=state.peln,
+                    w=state.w,
                 )
 
                 self._halo_updaters.zh.start()
@@ -962,21 +927,21 @@ def __call__(
             if not self.config.hydrostatic:
                 self._halo_updaters.zh.wait()
                 self._compute_geopotential_stencil(
-                    self._zh,
-                    self._gz,
+                    self.zh,
+                    self.gz,
                 )
                 self._halo_updaters.pkc.wait()
 
                 self.nonhydrostatic_pressure_gradient(
-                    state.u,
-                    state.v,
-                    self._pkc,
-                    self._gz,
-                    self._pk3,
-                    state.delp,
-                    dt_acoustic_substep,
-                    self._ptop,
-                    self._akap,
+                    u=state.u,
+                    v=state.v,
+                    pp=self.pkc,
+                    gz=self.gz,
+                    pk3=self._pk3,
+                    delp=state.delp,
+                    dt=dt_acoustic_substep,
+                    ptop=self._ptop,
+                    akap=self._akap,
                 )
 
             if self.config.rf_fast:
@@ -1001,19 +966,19 @@ def __call__(
                 if self.config.grid_type < 4:
                     self._halo_updaters.interface_uc__vc.interface()
 
-        # we are here
-
         if self._do_del2cubed:
             self._halo_updaters.heat_source.update()
             # TODO: move dependence on da_min into init of hyperdiffusion class
-            da_min: Float = self._get_da_min()
-            cd = constants.CNST_0P20 * da_min
+            cd = constants.CNST_0P20 * self._da_min
             # we want to diffuse the heat source from damping before we apply it,
             # so that we don't reinforce the same grid-scale patterns we're trying
             # to damp
-            self._hyperdiffusion(self._heat_source, cd)
+            self._hyperdiffusion(self.heat_source, cd)
             if not self.config.hydrostatic:
-                delt_time_factor = abs(dt_acoustic_substep * self.config.delt_max)
+                delt_time_factor = np.abs(
+                    dt_acoustic_substep * Float(self.config.delt_max),
+                    dtype=Float,
+                )
                 # TODO: it looks like state.pkz is being used as a temporary here,
                 # and overwritten at the start of remapping. See if we can make it
                 # an internal temporary of this stencil.
@@ -1021,7 +986,7 @@ def __call__(
                     state.delp,
                     state.delz,
                     self.cappa,
-                    self._heat_source,
+                    self.heat_source,
                     state.pt,
                     delt_time_factor,
                 )
diff --git a/pyfv3/stencils/fillz.py b/pyfv3/stencils/fillz.py
index 4fcaf881..1947d13d 100644
--- a/pyfv3/stencils/fillz.py
+++ b/pyfv3/stencils/fillz.py
@@ -1,14 +1,13 @@
-import typing
-
-import dace
+from typing_extensions import no_type_check
 
 from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_DIM
 from ndsl.dsl.gt4py import BACKWARD, FORWARD, PARALLEL, computation, interval, max, min
 from ndsl.dsl.typing import FloatField, FloatFieldIJ, Int, IntFieldIJ
+from pyfv3.tracers import FVTracers
 
 
-@typing.no_type_check
+@no_type_check
 def fix_tracer(
     q: FloatField,
     dp: FloatField,
@@ -122,25 +121,25 @@ def __init__(
         # Setting initial value of upper_fix to zero is only needed for validation.
         # The values in the compute domain are set to zero in the stencil.
         self._zfix = self.make_local(quantity_factory, [I_DIM, J_DIM], dtype=Int)
-        self._zfix.data[:] = 0
+        self._zfix[:] = 0
         self._sum0 = self.make_local(quantity_factory, [I_DIM, J_DIM])
-        self._sum0.data[:] = 0
+        self._sum0[:] = 0
         self._sum1 = self.make_local(quantity_factory, [I_DIM, J_DIM])
-        self._sum1.data[:] = 0
+        self._sum1[:] = 0
 
     def __call__(
         self,
         dp2: FloatField,
-        tracers: dace.compiletime,  # dict[str, Quantity],
+        tracers: FVTracers,
     ):
         """
         Args:
             dp2 (in): pressure thickness of atmospheric layer
             tracers (inout): tracers to fix negative masses in
         """
-        for tracer_name in tracers.keys():
+        for i_tracer in range(0, self._nq):
             self._fix_tracer_stencil(
-                tracers[tracer_name],
+                tracers[:, :, :, i_tracer],
                 dp2,
                 self._zfix,
                 self._sum0,
diff --git a/pyfv3/stencils/fv_dynamics.py b/pyfv3/stencils/fv_dynamics.py
index 79b9e87b..7285c274 100644
--- a/pyfv3/stencils/fv_dynamics.py
+++ b/pyfv3/stencils/fv_dynamics.py
@@ -1,34 +1,179 @@
-from collections.abc import Mapping
 from datetime import timedelta
 
-from dace.frontend.python.interface import nounroll as dace_no_unroll
-
-import ndsl.dsl.gt4py_utils as utils
 import pyfv3.stencils.moist_cv as moist_cv
-from ndsl import Quantity, QuantityFactory, StencilFactory, WrappedHaloUpdater
-from ndsl.checkpointer import NullCheckpointer
+from ndsl import (
+    NDSLRuntime,
+    Quantity,
+    QuantityFactory,
+    StencilFactory,
+    WrappedHaloUpdater,
+)
 from ndsl.comm.mpi import MPI
-from ndsl.constants import I_DIM, J_DIM, K_DIM, K_INTERFACE_DIM, KAPPA, NQ, ZVIR
+from ndsl.constants import (
+    I_DIM,
+    I_INTERFACE_DIM,
+    J_DIM,
+    J_INTERFACE_DIM,
+    K_DIM,
+    K_INTERFACE_DIM,
+    KAPPA,
+    NQ,
+    ZVIR,
+)
 from ndsl.dsl.dace.orchestration import dace_inhibitor, orchestrate
-from ndsl.dsl.gt4py import PARALLEL, computation, interval
-from ndsl.dsl.typing import Float, FloatField
+from ndsl.dsl.gt4py import FORWARD, PARALLEL, computation, interval
+from ndsl.dsl.typing import (
+    NDSL_64BIT_FLOAT_TYPE,
+    Float,
+    FloatField,
+    FloatField64,
+    FloatFieldIJ64,
+    get_precision,
+)
 from ndsl.grid import DampingCoefficients, GridData
 from ndsl.logging import ndsl_log
 from ndsl.performance import Timer
-from ndsl.stencils.basic_operations import copy
+from ndsl.stencils.basic_operations import copy, set_value
 from ndsl.stencils.c2l_ord import CubedToLatLon
-from ndsl.typing import Checkpointer, Communicator
+from ndsl.typing import Communicator
 from pyfv3._config import DynamicalCoreConfig
 from pyfv3.dycore_state import DycoreState
 from pyfv3.stencils import fvtp2d, tracer_2d_1l
+from pyfv3.stencils.compute_total_energy import ComputeTotalEnergy
 from pyfv3.stencils.del2cubed import HyperdiffusionDamping
 from pyfv3.stencils.dyn_core import AcousticDynamics
 from pyfv3.stencils.neg_adj3 import AdjustNegativeTracerMixingRatio
 from pyfv3.stencils.remapping import LagrangianToEulerian
+from pyfv3.stencils.remapping_GEOS import LagrangianToEulerian_GEOS
+from pyfv3.tracers import FVTracers, FVTracersAxisName
+from pyfv3.version import IS_GEOS
+
+
+class DryMassRoundOff(NDSLRuntime):
+    def __init__(
+        self,
+        comm: Communicator,
+        quantity_factory: QuantityFactory,
+        stencil_factory: StencilFactory,
+        state: DycoreState,
+        hydrostatic: bool,
+    ) -> None:
+        super().__init__(stencil_factory)
+
+        self._psx_2d = self.make_local(
+            quantity_factory,
+            [I_DIM, J_DIM],
+            dtype=NDSL_64BIT_FLOAT_TYPE,
+            allow_mismatch_float_precision=True,
+        )
+        # This is a quantity because it is used _outside_ of
+        # DryMassRoundOff. It should be an output
+        self.dpx = quantity_factory.zeros(
+            [I_DIM, J_DIM, K_DIM],
+            "unknown",
+            dtype=NDSL_64BIT_FLOAT_TYPE,
+            allow_mismatch_float_precision=True,
+        )
+        self._dpx0_2d = self.make_local(
+            quantity_factory,
+            [I_DIM, J_DIM],
+            dtype=NDSL_64BIT_FLOAT_TYPE,
+            allow_mismatch_float_precision=True,
+        )
+
+        self._reset = stencil_factory.from_origin_domain(
+            DryMassRoundOff._reset_stencil,
+            origin=stencil_factory.grid_indexing.origin_compute(),
+            domain=stencil_factory.grid_indexing.domain_compute(),
+        )
+        self._apply_psx_to_pe = stencil_factory.from_origin_domain(
+            DryMassRoundOff._apply_psx_to_pe_stencil,
+            origin=stencil_factory.grid_indexing.origin_compute(),
+            domain=stencil_factory.grid_indexing.domain_compute(),
+        )
+        self._apply_dpx_to_psx = stencil_factory.from_origin_domain(
+            DryMassRoundOff._apply_dpx_to_psx_stencil,
+            origin=stencil_factory.grid_indexing.origin_compute(),
+            domain=stencil_factory.grid_indexing.domain_compute(),
+        )
+
+        halo_spec = quantity_factory.get_quantity_halo_spec(
+            dims=[I_DIM, J_DIM, K_INTERFACE_DIM],
+            n_halo=stencil_factory.grid_indexing.n_halo,
+            dtype=Float,
+        )
+        self._pe_halo_updater = WrappedHaloUpdater(
+            comm.get_scalar_halo_updater([halo_spec]),
+            state,
+            ["pe"],
+        )
+
+        self._hydrostatic = hydrostatic
+
+    @staticmethod
+    def _reset_stencil(
+        dpx: FloatField64,
+        psx_2d: FloatFieldIJ64,
+        pe: FloatField,
+    ):
+        with computation(PARALLEL), interval(...):
+            dpx = 0.0
+        with computation(FORWARD), interval(-1, None):
+            psx_2d = pe[0, 0, 1]
+
+    @staticmethod
+    def _apply_dpx_to_psx_stencil(
+        dpx: FloatField64,
+        dpx0_2d: FloatFieldIJ64,
+        psx_2d: FloatFieldIJ64,
+    ):
+        with computation(FORWARD), interval(0, 1):
+            dpx0_2d = dpx
+
+        with computation(FORWARD), interval(1, None):
+            dpx0_2d += dpx
+
+        with computation(FORWARD), interval(0, 1):
+            psx_2d += psx_2d + dpx0_2d
+
+    @staticmethod
+    def _apply_psx_to_pe_stencil(
+        psx_2d: FloatFieldIJ64,
+        pe: FloatField,
+    ):
+        with computation(FORWARD), interval(-1, None):
+            pe[0, 0, 1] = psx_2d
+
+    def reset(self, pe: FloatField):
+        self._reset(dpx=self.dpx, psx_2d=self._psx_2d, pe=pe)
+
+    def apply(self, pe: FloatField):
+        self._apply_dpx_to_psx(self.dpx, self._dpx0_2d, self._psx_2d)
+        self._pe_halo_updater.update()
+        self._apply_psx_to_pe(self._psx_2d, pe)
+
+
+def _increment_stencil(
+    value: FloatField,
+    increment: FloatField,
+):
+    with computation(PARALLEL), interval(...):
+        value += increment
+
+
+def _copy_cast_defn(
+    q_in_64: FloatField64,
+    q_out: FloatField,
+):
+    with computation(PARALLEL), interval(...):
+        q_out = q_in_64
 
 
 def pt_to_potential_density_pt(
-    pkz: FloatField, dp_initial: FloatField, q_con: FloatField, pt: FloatField
+    pkz: FloatField,
+    dp_initial: FloatField,
+    q_con: FloatField,
+    pt: FloatField,
 ):
     """
     Args:
@@ -43,7 +188,12 @@ def pt_to_potential_density_pt(
         pt = pt * (1.0 + dp_initial) * (1.0 - q_con) / pkz
 
 
-def omega_from_w(delp: FloatField, delz: FloatField, w: FloatField, omega: FloatField):
+def omega_from_w(
+    delp: FloatField,
+    delz: FloatField,
+    w: FloatField,
+    omega: FloatField,
+):
     """
     Args:
         delp (in): vertical layer thickness in Pa
@@ -55,25 +205,6 @@ def omega_from_w(delp: FloatField, delz: FloatField, w: FloatField, omega: Float
         omega = delp / delz * w
 
 
-def fvdyn_temporaries(quantity_factory: QuantityFactory) -> Mapping[str, Quantity]:
-    tmps = {}
-    for name in ["te_2d", "te0_2d", "wsd"]:
-        quantity = quantity_factory.zeros(
-            dims=[I_DIM, J_DIM],
-            units="unknown",
-            dtype=Float,
-        )
-        tmps[name] = quantity
-    for name in ["dp1", "cvm"]:
-        quantity = quantity_factory.zeros(
-            dims=[I_DIM, J_DIM, K_DIM],
-            units="unknown",
-            dtype=Float,
-        )
-        tmps[name] = quantity
-    return tmps
-
-
 @dace_inhibitor
 def log_on_rank_0(message: str) -> None:
     """Print when rank is 0 - outside of DaCe critical path"""
@@ -81,7 +212,7 @@ def log_on_rank_0(message: str) -> None:
         ndsl_log.info(message)
 
 
-class DynamicalCore:
+class DynamicalCore(NDSLRuntime):
     """
     Corresponds to fv_dynamics in original Fortran sources.
     """
@@ -97,8 +228,7 @@ def __init__(
         phis: Quantity,
         state: DycoreState,
         timestep: timedelta,
-        checkpointer: Checkpointer | None = None,
-    ) -> None:
+    ):
         """
         Args:
             comm: object for cubed sphere or tile inter-process communication
@@ -109,11 +239,12 @@ def __init__(
                 the namelist in the Fortran model
             phis: surface geopotential height
             state: model state
+            exclude_tracer: List of named tracer to be excluded from the Advection,
+                and Remapping schemes
             timestep: model timestep
-            checkpointer: if given, used to perform operations on model data
-                at specific points in model execution, such as testing against
-                reference data
         """
+        super().__init__(stencil_factory)
+
         orchestrate(
             obj=self,
             config=stencil_factory.config.dace_config,
@@ -135,49 +266,12 @@ def __init__(
             dace_compiletime_args=["state", "timer"],
         )
 
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-            method_to_orchestrate="_checkpoint_fvdynamics",
-            dace_compiletime_args=["state", "tag"],
-        )
-
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-            method_to_orchestrate="_checkpoint_remapping_in",
-            dace_compiletime_args=[
-                "state",
-            ],
-        )
-
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-            method_to_orchestrate="_checkpoint_remapping_out",
-            dace_compiletime_args=["state"],
-        )
-
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-            method_to_orchestrate="_checkpoint_tracer_advection_in",
-            dace_compiletime_args=["state"],
-        )
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-            method_to_orchestrate="_checkpoint_tracer_advection_out",
-            dace_compiletime_args=["state"],
-        )
         if timestep == timedelta(seconds=0):
             raise RuntimeError(
                 "Bad dynamical core configuration: the atmospheric timestep is 0 seconds!"
             )
         # nested and stretched_grid are options in the Fortran code which we
         # have not implemented, so they are hard-coded here.
-        self.call_checkpointer = checkpointer is not None
-        self.checkpointer = NullCheckpointer() if checkpointer is None else checkpointer
         nested = False
         stretched_grid = False
         grid_indexing = stencil_factory.grid_indexing
@@ -185,18 +279,50 @@ def __init__(
             raise NotImplementedError(
                 "Dynamical core (fv_dynamics): fvsetup is only implemented for moist_phys=true."
             )
-        if config.nwat != 6:
+        if config.nwat not in [0, 6]:
             raise NotImplementedError(
                 "Dynamical core (fv_dynamics):"
                 f" nwat=={config.nwat} is not implemented."
-                " Only nwat=6 has been implemented."
+                " Only nwat=0 or 6 has been implemented."
             )
+
+        if config.nwat == 6:
+            # Implemented dynamics options require those tracers to be present at minima
+            # this is a more granular list than carried by the `nwat` single integer
+            # but cover the same topic
+            required_tracers = [
+                "vapor",
+                "liquid",
+                "rain",
+                "snow",
+                "ice",
+                "graupel",
+                "cloud",
+            ]
+            if not all(n in FVTracers.mapping.keys() for n in required_tracers):
+                raise NotImplementedError(
+                    "Dynamical core (fv_dynamics):"
+                    " missing required tracers. Dynamics requires:\n"
+                    f" {required_tracers}\n"
+                    "but only the following where given:\n"
+                    f" {FVTracers.mapping.keys()}"
+                )
+
+        self._comm = comm
         self.comm_rank = comm.rank
         self.grid_data = grid_data
         self.grid_indexing = grid_indexing
         self._da_min = damping_coefficients.da_min
         self.config = config
 
+        self.dry_mass_control = DryMassRoundOff(
+            comm=comm,
+            quantity_factory=quantity_factory,
+            stencil_factory=stencil_factory,
+            state=state,
+            hydrostatic=self.config.hydrostatic,
+        )
+
         tracer_transport = fvtp2d.FiniteVolumeTransport(
             stencil_factory=stencil_factory,
             quantity_factory=quantity_factory,
@@ -206,16 +332,19 @@ def __init__(
             hord=config.hord_tr,
         )
 
-        self.tracers = {}
-        for name in utils.tracer_variables[0:NQ]:
-            self.tracers[name] = state.__dict__[name]
+        if FVTracersAxisName not in quantity_factory.sizer.data_dimensions:
+            raise RuntimeError(
+                "FV Dynamics requires FVTracers to be registered - see `pyfv3.tracers`"
+            )
+
+        # Locals
+        # self._te0_2d = self.make_local(quantity_factory, [I_DIM, J_DIM])
+        self._wsd = self.make_local(quantity_factory, [I_DIM, J_DIM])
+        self._dp_initial = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._cvm = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
 
-        temporaries = fvdyn_temporaries(quantity_factory)
-        self._te_2d = temporaries["te_2d"]
-        self._te0_2d = temporaries["te0_2d"]
-        self._wsd = temporaries["wsd"]
-        self._dp_initial = temporaries["dp1"]
-        self._cvm = temporaries["cvm"]
+        # TODO: this is a true Local, but defining at such breaks `pt` in orchestration
+        self._te0_2d = quantity_factory.zeros([I_DIM, J_DIM], "")
 
         # Build advection stencils
         self.tracer_advection = tracer_2d_1l.TracerAdvection(
@@ -224,7 +353,7 @@ def __init__(
             tracer_transport,
             self.grid_data,
             comm,
-            self.tracers,
+            state.tracers,
         )
         self._ak = grid_data.ak
         self._bk = grid_data.bk
@@ -236,6 +365,14 @@ def __init__(
             externals={
                 "nwat": self.config.nwat,
                 "moist_phys": self.config.moist_phys,
+                "i_vapor": FVTracers.index("vapor"),
+                "i_liquid": FVTracers.index("liquid") if self.config.nwat == 6 else -1,
+                "i_rain": FVTracers.index("rain") if self.config.nwat == 6 else -1,
+                "i_ice": FVTracers.index("ice") if self.config.nwat == 6 else -1,
+                "i_snow": FVTracers.index("snow") if self.config.nwat == 6 else -1,
+                "i_graupel": (
+                    FVTracers.index("graupel") if self.config.nwat == 6 else -1
+                ),
             },
             origin=grid_indexing.origin_compute(),
             domain=grid_indexing.domain_compute(),
@@ -255,6 +392,11 @@ def __init__(
             origin=grid_indexing.origin_full(),
             domain=grid_indexing.domain_full(),
         )
+        self._copy_domain = stencil_factory.from_origin_domain(
+            copy,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(),
+        )
         self.acoustic_dynamics = AcousticDynamics(
             comm=comm,
             stencil_factory=stencil_factory,
@@ -266,9 +408,7 @@ def __init__(
             stretched_grid=stretched_grid,
             config=self.config.acoustic_dynamics,
             phis=self._phis,
-            wsd=self._wsd,
             state=state,
-            checkpointer=checkpointer,
         )
         self._hyperdiffusion = HyperdiffusionDamping(
             stencil_factory,
@@ -299,15 +439,35 @@ def __init__(
             hydrostatic=self.config.hydrostatic,
         )
 
-        self._lagrangian_to_eulerian_obj = LagrangianToEulerian(
+        self._compute_total_energy = ComputeTotalEnergy(
+            config=config,
             stencil_factory=stencil_factory,
             quantity_factory=quantity_factory,
-            config=config.remapping,
-            area_64=grid_data.area_64,
-            nq=NQ,
-            pfull=self._pfull,
+            grid_data=grid_data,
         )
 
+        if IS_GEOS:
+            self._lagrangian_to_eulerian_GEOS = LagrangianToEulerian_GEOS(
+                stencil_factory=stencil_factory,
+                quantity_factory=quantity_factory,
+                config=config.remapping,
+                comm=comm,
+                grid_data=grid_data,
+                pfull=self._pfull,
+                adiabatic=config.adiabatic,
+                nwat=self.config.nwat,
+            )
+
+        else:
+            self._lagrangian_to_eulerian_obj = LagrangianToEulerian(
+                stencil_factory=stencil_factory,
+                quantity_factory=quantity_factory,
+                config=config.remapping,
+                area_64=grid_data.area_64,
+                pfull=self._pfull,
+                nwat=self.config.nwat,
+            )
+
         full_xyz_spec = quantity_factory.get_quantity_halo_spec(
             dims=[I_DIM, J_DIM, K_DIM],
             n_halo=grid_indexing.n_halo,
@@ -321,101 +481,73 @@ def __init__(
         self._conserve_total_energy = config.consv_te
         self._timestep = timestep.total_seconds()
 
-    # See divergence_damping.py, _get_da_min for explanation of this function
-    @dace_inhibitor
-    def _get_da_min(self) -> float:
-        return self._da_min
-
-    def _checkpoint_fvdynamics(self, state: DycoreState, tag: str) -> None:
-        if self.call_checkpointer:
-            self.checkpointer(
-                f"FVDynamics-{tag}",
-                u=state.u,
-                v=state.v,
-                w=state.w,
-                delz=state.delz,
-                # ua is not checked as its halo values differ from Fortran,
-                # this can be re-enabled if no longer comparing to Fortran, if the
-                # Fortran is updated to match the Python, or if the checkpointer
-                # can check only the compute domain values
-                # ua=state.ua,
-                va=state.va,
-                uc=state.uc,
-                vc=state.vc,
-                qvapor=state.qvapor,
+        # At 32-bit precision we still need
+        self._f32_correction = get_precision() == 32
+        if self._f32_correction:
+            self._mfx_f64 = quantity_factory.zeros(
+                dims=[I_INTERFACE_DIM, J_DIM, K_DIM],
+                units="unknown",
+                dtype=NDSL_64BIT_FLOAT_TYPE,
+                allow_mismatch_float_precision=True,
             )
-
-    def _checkpoint_remapping_in(self, state: DycoreState) -> None:
-        if self.call_checkpointer:
-            self.checkpointer(
-                "Remapping-In",
-                pt=state.pt,
-                delp=state.delp,
-                delz=state.delz,
-                peln=state.peln.transpose(
-                    [I_DIM, K_INTERFACE_DIM, J_DIM]
-                ),  # [x, z, y] fortran data
-                u=state.u,
-                v=state.v,
-                w=state.w,
-                ua=state.ua,
-                va=state.va,
-                cappa=self._cappa,
-                pk=state.pk,
-                pe=state.pe.transpose(
-                    [I_DIM, K_INTERFACE_DIM, J_DIM]
-                ),  # [x, z, y] fortran data
-                phis=state.phis,
-                te_2d=self._te0_2d,
-                ps=state.ps,
-                wsd=self._wsd,
-                omga=state.omga,
-                dp1=self._dp_initial,
+            self._mfy_f64 = quantity_factory.zeros(
+                dims=[I_DIM, J_INTERFACE_DIM, K_DIM],
+                units="unknown",
+                dtype=NDSL_64BIT_FLOAT_TYPE,
+                allow_mismatch_float_precision=True,
             )
-
-    def _checkpoint_remapping_out(self, state: DycoreState) -> None:
-        if self.call_checkpointer:
-            self.checkpointer(
-                "Remapping-Out",
-                pt=state.pt,
-                delp=state.delp,
-                delz=state.delz,
-                peln=state.peln.transpose(
-                    [I_DIM, K_INTERFACE_DIM, J_DIM]
-                ),  # [x, z, y] fortran data
-                u=state.u,
-                v=state.v,
-                w=state.w,
-                cappa=self._cappa,
-                pkz=state.pkz,
-                pk=state.pk,
-                pe=state.pe.transpose(
-                    [I_DIM, K_INTERFACE_DIM, J_DIM]
-                ),  # [x, z, y] fortran data
-                dp1=self._dp_initial,
+            self._cx_f64 = quantity_factory.zeros(
+                dims=[I_INTERFACE_DIM, J_DIM, K_DIM],
+                units="unknown",
+                dtype=NDSL_64BIT_FLOAT_TYPE,
+                allow_mismatch_float_precision=True,
             )
-
-    def _checkpoint_tracer_advection_in(self, state: DycoreState) -> None:
-        if self.call_checkpointer:
-            self.checkpointer(
-                "Tracer2D1L-In",
-                dp1=self._dp_initial,
-                mfxd=state.mfxd,
-                mfyd=state.mfyd,
-                cxd=state.cxd,
-                cyd=state.cyd,
-            )
-
-    def _checkpoint_tracer_advection_out(self, state: DycoreState) -> None:
-        if self.call_checkpointer:
-            self.checkpointer(
-                "Tracer2D1L-Out",
-                dp1=self._dp_initial,
-                mfxd=state.mfxd,
-                mfyd=state.mfyd,
-                cxd=state.cxd,
-                cyd=state.cyd,
+            self._cy_f64 = quantity_factory.zeros(
+                dims=[I_DIM, J_INTERFACE_DIM, K_DIM],
+                units="unknown",
+                dtype=NDSL_64BIT_FLOAT_TYPE,
+                allow_mismatch_float_precision=True,
             )
+        self._mfx_local = quantity_factory.zeros(
+            dims=[I_INTERFACE_DIM, J_DIM, K_DIM],
+            units="unknown",
+            dtype=Float,
+        )
+        self._mfy_local = quantity_factory.zeros(
+            dims=[I_DIM, J_INTERFACE_DIM, K_DIM],
+            units="unknown",
+            dtype=Float,
+        )
+        self._cx_local = quantity_factory.zeros(
+            dims=[I_INTERFACE_DIM, J_DIM, K_DIM],
+            units="unknown",
+            dtype=Float,
+        )
+        self._cy_local = quantity_factory.zeros(
+            dims=[I_DIM, J_INTERFACE_DIM, K_DIM],
+            units="unknown",
+            dtype=Float,
+        )
+        self._set_value_I_interface = stencil_factory.from_origin_domain(
+            func=set_value,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(1, 0, 0)),
+        )
+        self._set_value_J_interface = stencil_factory.from_origin_domain(
+            func=set_value,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(0, 1, 0)),
+        )
+        self._increment = stencil_factory.from_origin_domain(
+            func=_increment_stencil,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(1, 1, 0)),
+        )
+        self._copy_cast = stencil_factory.from_origin_domain(
+            func=_copy_cast_defn,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(1, 1, 0)),
+        )
 
     def step_dynamics(self, state: DycoreState, timer: Timer) -> None:
         """
@@ -425,9 +557,7 @@ def step_dynamics(self, state: DycoreState, timer: Timer) -> None:
             state: model prognostic state and inputs
             timer: keep time of model sections
         """
-        self._checkpoint_fvdynamics(state=state, tag="In")
         self._compute(state, timer)
-        self._checkpoint_fvdynamics(state=state, tag="Out")
 
     def compute_preamble(self, state: DycoreState) -> None:
         if self.config.hydrostatic:
@@ -436,13 +566,14 @@ def compute_preamble(self, state: DycoreState) -> None:
         if __debug__:
             log_on_rank_0("FV Setup")
 
+        # Reset fluxes
+        self._set_value_I_interface(state.mfxd, Float(0.0))
+        self._set_value_I_interface(state.cxd, Float(0.0))
+        self._set_value_J_interface(state.mfyd, Float(0.0))
+        self._set_value_J_interface(state.cyd, Float(0.0))
+
         self._fv_setup_stencil(
-            state.qvapor,
-            state.qliquid,
-            state.qrain,
-            state.qsnow,
-            state.qice,
-            state.qgraupel,
+            state.tracers,
             state.q_con,
             self._cvm,
             state.pkz,
@@ -453,40 +584,58 @@ def compute_preamble(self, state: DycoreState) -> None:
             self._dp_initial,
         )
 
-        if self._conserve_total_energy > 0:
-            raise NotImplementedError(
-                "Dynamical Core (fv_dynamics): compute total energy is not implemented"
+        # Compute total energy
+        if self.config.consv_te > 0.0:
+            self._compute_total_energy(
+                hs=state.phis,
+                delp=state.delp,
+                delz=state.delz,
+                qc=self._dp_initial,
+                pt=state.pt,
+                u=state.u,
+                v=state.v,
+                w=state.w,
+                tracers=state.tracers,
+                te_2d=self._te0_2d,
             )
 
-        if (not self.config.rf_fast) and self.config.tau != 0:
+        # Rayleigh fast
+        if (
+            not self.config.hydrostatic
+            and not self.config.acoustic_dynamics.rf_fast
+            and self.config.acoustic_dynamics.tau > 0
+        ):
             raise NotImplementedError(
-                "Dynamical Core (fv_dynamics): Rayleigh_Super,"
-                " called when rf_fast=False and tau !=0, is not implemented"
+                "Dynamical Core (fv_dynamics): Rayleigh Friction is not implemented."
             )
 
-        if self.config.adiabatic and self.config.kord_tm > 0:
+        # Adjust pt
+        if self.config.adiabatic:
             raise NotImplementedError(
-                "Dynamical Core (fv_dynamics): Adiabatic with positive kord_tm is not implemented."
+                "Dynamical Core (fv_dynamics): Adiabatic pt adjust is not implemented."
             )
+        else:
+            if self.config.hydrostatic:
+                raise NotImplementedError(
+                    "Dynamical Core (fv_dynamics): Hydrostatic pt adjust is not implemented."
+                )
+            else:
+                self._pt_to_potential_density_pt(
+                    state.pkz,
+                    self._dp_initial,
+                    state.q_con,
+                    state.pt,
+                )
 
-        if __debug__:
-            log_on_rank_0("Adjust pt")
-
-        self._pt_to_potential_density_pt(
-            state.pkz,
-            self._dp_initial,
-            state.q_con,
-            state.pt,
-        )
+        self.dry_mass_control.reset(pe=state.pe)
 
     def __call__(self, *args, **kwargs) -> None:
         self.step_dynamics(*args, **kwargs)
 
     def _compute(self, state: DycoreState, timer: Timer) -> None:
-        last_step = False
         self.compute_preamble(state)
 
-        for k_split in dace_no_unroll(range(self._k_split)):
+        for k_split in range(self._k_split):
             n_map = k_split + 1
             last_step = k_split == self._k_split - 1
             # TODO: why are we copying delp to dp1? what is dp1?
@@ -500,26 +649,36 @@ def _compute(self, state: DycoreState, timer: Timer) -> None:
 
             with timer.clock("DynCore"):
                 self.acoustic_dynamics(
-                    state,
+                    state=state,
+                    mfxd=self._mfx_f64 if self._f32_correction else self._mfx_local,
+                    mfyd=self._mfy_f64 if self._f32_correction else self._mfy_local,
+                    cxd=self._cx_f64 if self._f32_correction else self._cx_local,
+                    cyd=self._cy_f64 if self._f32_correction else self._cy_local,
+                    dpx=self.dry_mass_control.dpx,
+                    wsd=self._wsd,
                     timestep=self._timestep / self._k_split,
                     n_map=n_map,
                 )
-
+                if self._f32_correction:
+                    self._copy_cast(self._mfx_f64, self._mfx_local)
+                    self._copy_cast(self._mfy_f64, self._mfy_local)
+                    self._copy_cast(self._cx_f64, self._cx_local)
+                    self._copy_cast(self._cy_f64, self._cy_local)
+                if last_step and self.config.hydrostatic:
+                    self.dry_mass_control.apply(state.pe)
             if self.config.z_tracer:
                 if __debug__:
                     log_on_rank_0("TracerAdvection")
 
                 with timer.clock("TracerAdvection"):
-                    self._checkpoint_tracer_advection_in(state)
                     self.tracer_advection(
-                        self.tracers,
+                        state.tracers,
                         self._dp_initial,
-                        state.mfxd,
-                        state.mfyd,
-                        state.cxd,
-                        state.cyd,
+                        x_mass_flux=self._mfx_local,
+                        y_mass_flux=self._mfy_local,
+                        x_courant=self._cx_local,
+                        y_courant=self._cy_local,
                     )
-                    self._checkpoint_tracer_advection_out(state)
             else:
                 raise NotImplementedError("z_tracer=False is not implemented")
 
@@ -542,46 +701,82 @@ def _compute(self, state: DycoreState, timer: Timer) -> None:
                     log_on_rank_0("Remapping")
 
                 with timer.clock("Remapping"):
-                    self._checkpoint_remapping_in(state)
-
-                    # TODO: When NQ=9, we shouldn't need to pass qcld explicitly
-                    #       since it's in self.tracers. It should not be an issue since
-                    #       we don't have self.tracers & qcld computation at the same
-                    #       time
-                    #       When NQ=8, we do need qcld passed explicitely
-                    self._lagrangian_to_eulerian_obj(
-                        self.tracers,
-                        state.pt,
-                        state.delp,
-                        state.delz,
-                        state.peln,
-                        state.u,
-                        state.v,
-                        state.w,
-                        self._cappa,
-                        state.q_con,
-                        state.qcld,
-                        state.pkz,
-                        state.pk,
-                        state.pe,
-                        state.phis,
-                        state.ps,
-                        self._wsd,
-                        self._ak,
-                        self._bk,
-                        self._dp_initial,
-                        self._ptop,
-                        KAPPA,
-                        ZVIR,
-                        last_step,
-                        self._conserve_total_energy,
-                        self._timestep / self._k_split,
-                    )
-                    self._checkpoint_remapping_out(state)
+                    if IS_GEOS:
+                        self._lagrangian_to_eulerian_GEOS(
+                            tracers=state.tracers,
+                            pt=state.pt,
+                            delp=state.delp,
+                            delz=state.delz,
+                            peln=state.peln,
+                            u=state.u,
+                            v=state.v,
+                            w=state.w,
+                            mfx=self._mfx_local,
+                            mfy=self._mfy_local,
+                            cx=self._cx_local,
+                            cy=self._cy_local,
+                            cappa=self._cappa,
+                            q_con=state.q_con,
+                            pkz=state.pkz,
+                            pk=state.pk,
+                            pe=state.pe,
+                            hs=state.phis,
+                            te0_2d=self._te0_2d,
+                            ps=state.ps,
+                            wsd=self._wsd,
+                            ak=self._ak,
+                            bk=self._bk,
+                            dp1=self._dp_initial,
+                            ptop=self._ptop,
+                            akap=KAPPA,
+                            zvir=ZVIR,
+                            last_step=last_step,
+                            consv_te=self._conserve_total_energy,
+                            mdt=self._timestep / self._k_split,
+                        )
+                    else:
+                        # TODO: When NQ=9, we shouldn't need to pass qcld explicitly
+                        #       since it's in self.tracers. It should not be an issue
+                        #       since we don't have self.tracers & qcld computation
+                        #       at the same time
+                        #       When NQ=8, we do need qcld passed explicitely
+                        self._lagrangian_to_eulerian_obj(
+                            state.tracers,
+                            state.pt,
+                            state.delp,
+                            state.delz,
+                            state.peln,
+                            state.u,
+                            state.v,
+                            state.w,
+                            self._cappa,
+                            state.q_con,
+                            state.pkz,
+                            state.pk,
+                            state.pe,
+                            state.phis,
+                            state.ps,
+                            self._wsd,
+                            self._ak,
+                            self._bk,
+                            self._dp_initial,
+                            self._ptop,
+                            KAPPA,
+                            ZVIR,
+                            last_step,
+                            self._conserve_total_energy,
+                            self._timestep / self._k_split,
+                        )
                 # TODO: can we pull this block out of the loop intead of
                 # using an if-statement?
+
+                # Update state fluxes and courant number
+                self._increment(state.mfxd, self._mfx_local)
+                self._increment(state.mfyd, self._mfy_local)
+                self._increment(state.cxd, self._cx_local)
+                self._increment(state.cyd, self._cy_local)
+
                 if last_step:
-                    da_min: Float = self._get_da_min()
                     if not self.config.hydrostatic:
                         if __debug__:
                             log_on_rank_0("Omega")
@@ -597,21 +792,22 @@ def _compute(self, state: DycoreState, timer: Timer) -> None:
                         if __debug__:
                             log_on_rank_0("Del2Cubed")
                         self._omega_halo_updater.update()
-                        self._hyperdiffusion(state.omga, 0.18 * da_min)
+                        self._hyperdiffusion(state.omga, Float(0.18) * self._da_min)
 
-        if __debug__:
-            log_on_rank_0("Neg Adj 3")
-        self._adjust_tracer_mixing_ratio(
-            state.qvapor,
-            state.qliquid,
-            state.qrain,
-            state.qsnow,
-            state.qice,
-            state.qgraupel,
-            state.qcld,
-            state.pt,
-            state.delp,
-        )
+        if self.config.nwat >= 6:
+            if __debug__:
+                log_on_rank_0("Neg Adj 3")
+            self._adjust_tracer_mixing_ratio(
+                state.tracers[:, :, :, FVTracers.index("vapor")],
+                state.tracers[:, :, :, FVTracers.index("liquid")],
+                state.tracers[:, :, :, FVTracers.index("rain")],
+                state.tracers[:, :, :, FVTracers.index("snow")],
+                state.tracers[:, :, :, FVTracers.index("ice")],
+                state.tracers[:, :, :, FVTracers.index("graupel")],
+                state.tracers[:, :, :, FVTracers.index("cloud")],
+                state.pt,
+                state.delp,
+            )
 
         if __debug__:
             log_on_rank_0("CubedToLatLon")
diff --git a/pyfv3/stencils/fv_subgridz.py b/pyfv3/stencils/fv_subgridz.py
index 611fdf77..5004d14c 100644
--- a/pyfv3/stencils/fv_subgridz.py
+++ b/pyfv3/stencils/fv_subgridz.py
@@ -2,7 +2,7 @@
 import collections
 
 import ndsl.dsl.gt4py_utils as utils
-from ndsl import Quantity, QuantityFactory, StencilFactory
+from ndsl import NDSLRuntime, Quantity, QuantityFactory, StencilFactory
 from ndsl.constants import (
     C_ICE,
     C_LIQ,
@@ -733,9 +733,12 @@ def finalize(
 )
 
 
-class DryConvectiveAdjustment:
+class DryConvectiveAdjustment(NDSLRuntime):
     """
-    Corresponds to fv_subgrid_z in Fortran's fv_sg module
+    Corresponds to fv_subgrid_z in Fortran's fv_sg module.
+
+    ⚠️ ⚠️ ⚠️ This code fell out of validation sometime in 2024 ⚠️ ⚠️ ⚠️
+    ⚠️ ⚠️ ⚠️ Translate test deactivated - need re-validation   ⚠️ ⚠️ ⚠️
     """
 
     arg_specs = (
@@ -779,6 +782,8 @@ def __init__(
         n_sponge: int,
         hydrostatic: bool,
     ):
+        super().__init__(stencil_factory)
+
         if hydrostatic:
             raise NotImplementedError(
                 "DryConvectiveAdjustment (fv_subgridz): Hydrostatic is not implemented"
diff --git a/pyfv3/stencils/fvtp2d.py b/pyfv3/stencils/fvtp2d.py
index 05616aad..1096c057 100644
--- a/pyfv3/stencils/fvtp2d.py
+++ b/pyfv3/stencils/fvtp2d.py
@@ -1,4 +1,4 @@
-from ndsl import QuantityFactory, StencilFactory, orchestrate
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_DIM
 from ndsl.dsl.gt4py import PARALLEL, computation
 from ndsl.dsl.gt4py import function as gtfunction
@@ -115,7 +115,7 @@ def final_fluxes(
             )
 
 
-class FiniteVolumeTransport:
+class FiniteVolumeTransport(NDSLRuntime):
     """
     Equivalent of Fortran FV3 subroutine fv_tp_2d, done in 3 dimensions.
     Tested on serialized data with FvTp2d
@@ -133,10 +133,7 @@ def __init__(
         nord=None,
         damp_c=None,
     ):
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-        )
+        super().__init__(stencil_factory)
 
         # use a shorter alias for grid_indexing here to avoid very verbose lines
         idx = stencil_factory.grid_indexing
@@ -149,12 +146,20 @@ def make_quantity():
                 dtype=Float,
             )
 
-        self._q_advected_y = make_quantity()
-        self._q_advected_x = make_quantity()
-        self._q_x_advected_mean = make_quantity()
-        self._q_y_advected_mean = make_quantity()
-        self._q_advected_x_y_advected_mean = make_quantity()
-        self._q_advected_y_x_advected_mean = make_quantity()
+        self._q_advected_y = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._q_advected_x = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._q_x_advected_mean = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_DIM]
+        )
+        self._q_y_advected_mean = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_DIM]
+        )
+        self._q_advected_x_y_advected_mean = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_DIM]
+        )
+        self._q_advected_y_x_advected_mean = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_DIM]
+        )
         self._nord = nord
         self._damp_c = damp_c
         ord_outer = hord
@@ -227,18 +232,6 @@ def make_quantity():
             domain=idx.domain_compute(add=(1, 1, 1)),
         )
 
-    def _transport_flux(self, x_unit_flux, y_unit_flux, q_x_flux, q_y_flux):
-        self.stencil_transport_flux(
-            self._q_advected_y_x_advected_mean,
-            self._q_x_advected_mean,
-            self._q_advected_x_y_advected_mean,
-            self._q_y_advected_mean,
-            x_unit_flux,
-            y_unit_flux,
-            q_x_flux,
-            q_y_flux,
-        )
-
     def __call__(
         self,
         q,
@@ -267,7 +260,7 @@ def __call__(
         by contrast are area weighted.
 
         Args:
-            q (in): scalar to be transported
+            q (inout): scalar to be transported (corners are copied in halo)
             crx (in): Courant number in x-direction
             cry (in): Courant number in y-direction
             x_area_flux (in): flux of area in x-direction, in units of m^2
@@ -311,7 +304,7 @@ def __call__(
         # y_area_flux as an input (flux = area_flux * advected_mean), since a flux is
         # easier to understand than the current output. This would be like merging
         # yppm with q_i_stencil and xppm with q_j_stencil.
-        self._copy_corners_y(q.data)
+        self._copy_corners_y(q)
         self.y_piecewise_parabolic_inner(q, cry, self._q_y_advected_mean)
         # q_y_advected_mean is 1/Delta_area * curly-F, where curly-F is defined in
         # equation 4.3 of the FV3 documentation and Delta_area is the advected area
@@ -328,7 +321,7 @@ def __call__(
         )
         # q_advected_y_x_advected_mean is now rho^n + F(rho^y) in PL07 eq 16
 
-        self._copy_corners_x(q.data)
+        self._copy_corners_x(q)
         # similarly below for x<->y
         self.x_piecewise_parabolic_inner(q, crx, self._q_x_advected_mean)
         self.q_j_stencil(
@@ -344,16 +337,53 @@ def __call__(
 
         # TODO [DACE]: due to an aliasing issue (see above for original code)
         # we duplicate the code here
+
         if x_mass_flux is None:
             if y_mass_flux is None:
-                self._transport_flux(x_area_flux, y_area_flux, q_x_flux, q_y_flux)
+                self.stencil_transport_flux(
+                    self._q_advected_y_x_advected_mean,
+                    self._q_x_advected_mean,
+                    self._q_advected_x_y_advected_mean,
+                    self._q_y_advected_mean,
+                    x_area_flux,
+                    y_area_flux,
+                    q_x_flux,
+                    q_y_flux,
+                )
             else:
-                self._transport_flux(x_area_flux, y_mass_flux, q_x_flux, q_y_flux)
+                self.stencil_transport_flux(
+                    self._q_advected_y_x_advected_mean,
+                    self._q_x_advected_mean,
+                    self._q_advected_x_y_advected_mean,
+                    self._q_y_advected_mean,
+                    x_area_flux,
+                    y_mass_flux,
+                    q_x_flux,
+                    q_y_flux,
+                )
         else:
             if y_mass_flux is None:
-                self._transport_flux(x_mass_flux, y_area_flux, q_x_flux, q_y_flux)
+                self.stencil_transport_flux(
+                    self._q_advected_y_x_advected_mean,
+                    self._q_x_advected_mean,
+                    self._q_advected_x_y_advected_mean,
+                    self._q_y_advected_mean,
+                    x_mass_flux,
+                    y_area_flux,
+                    q_x_flux,
+                    q_y_flux,
+                )
             else:
-                self._transport_flux(x_mass_flux, y_mass_flux, q_x_flux, q_y_flux)
+                self.stencil_transport_flux(
+                    self._q_advected_y_x_advected_mean,
+                    self._q_x_advected_mean,
+                    self._q_advected_x_y_advected_mean,
+                    self._q_y_advected_mean,
+                    x_mass_flux,
+                    y_mass_flux,
+                    q_x_flux,
+                    q_y_flux,
+                )
 
         if self._do_delnflux:
             self.delnflux(q, q_x_flux, q_y_flux, mass=mass)
diff --git a/pyfv3/stencils/fxadv.py b/pyfv3/stencils/fxadv.py
index c9a5bda3..1163dcde 100644
--- a/pyfv3/stencils/fxadv.py
+++ b/pyfv3/stencils/fxadv.py
@@ -1,4 +1,4 @@
-from ndsl import StencilFactory, orchestrate
+from ndsl import NDSLRuntime, StencilFactory
 from ndsl.dsl.gt4py import PARALLEL, computation, horizontal, interval, region
 from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
 from ndsl.grid import GridData
@@ -479,27 +479,47 @@ def fxadv_fluxes_stencil(
         y_area_flux (out):
         uc_contra (in):
         vc_contra (in):
+
+    Porting Note
+    * The tmp introduced in the computation allows fxadv_fluxes_stencil to closely match the Fortran order
+      of computation, which allows the x_area_flux and y_area_flux match the
+      respective Fortran values.
+
+      Example of previous stencil looked as follows:
+        ==========================================================
+        if uc_contra > 0:
+                crx = dt * uc_contra * rdxa[-1, 0]
+                x_area_flux = dy * dt * uc_contra * sin_sg3[-1, 0]
+            else:
+                crx = dt * uc_contra * rdxa
+                x_area_flux = dy * dt * uc_contra * sin_sg1
+        ==========================================================
     """
     from __externals__ import local_ie, local_is, local_je, local_js
 
     with computation(PARALLEL), interval(...):
         with horizontal(region[local_is : local_ie + 2, :]):
+            # Including the temporary (tmp) calculation enables x_area_flux and y_area_flux
+            # to more closely precision match the respective Fortran calculation
+            # since Fortran also performs this temporary calcuation
+            tmp = dt * uc_contra
             if uc_contra > 0:
-                crx = dt * uc_contra * rdxa[-1, 0]
-                x_area_flux = dy * dt * uc_contra * sin_sg3[-1, 0]
+                crx = tmp * rdxa[-1, 0]
+                x_area_flux = dy * tmp * sin_sg3[-1, 0]
             else:
-                crx = dt * uc_contra * rdxa
-                x_area_flux = dy * dt * uc_contra * sin_sg1
+                crx = tmp * rdxa
+                x_area_flux = dy * tmp * sin_sg1
         with horizontal(region[:, local_js : local_je + 2]):
+            tmp = dt * vc_contra
             if vc_contra > 0:
-                cry = dt * vc_contra * rdya[0, -1]
-                y_area_flux = dx * dt * vc_contra * sin_sg4[0, -1]
+                cry = tmp * rdya[0, -1]
+                y_area_flux = dx * tmp * sin_sg4[0, -1]
             else:
-                cry = dt * vc_contra * rdya
-                y_area_flux = dx * dt * vc_contra * sin_sg2
+                cry = tmp * rdya
+                y_area_flux = dx * tmp * sin_sg2
 
 
-class FiniteVolumeFluxPrep:
+class FiniteVolumeFluxPrep(NDSLRuntime):
     """
     A large section of code near the beginning of Fortran's d_sw subroutine
     Known in this repo as FxAdv,
@@ -511,10 +531,8 @@ def __init__(
         grid_data: GridData,
         grid_type: int,
     ):
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-        )
+        super().__init__(stencil_factory)
+
         grid_indexing = stencil_factory.grid_indexing
         self._grid_type = grid_type
         self._tile_interior = not (
@@ -590,11 +608,6 @@ def __init__(
             origin=origin,
             domain=domain,
         )
-        # self._set_nans = get_set_nan_func(
-        #     grid_indexing,
-        #     dims=[I_DIM, J_DIM, K_DIM],
-        #     n_halo=((2, 2), (2, 2)),
-        # )
 
     def __call__(
         self,
diff --git a/pyfv3/stencils/map_single.py b/pyfv3/stencils/map_single.py
index ba8e139d..6d8423c9 100644
--- a/pyfv3/stencils/map_single.py
+++ b/pyfv3/stencils/map_single.py
@@ -1,10 +1,19 @@
-from collections.abc import Sequence
-from typing import Optional
+from typing import Optional, Sequence
 
 from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_DIM
 from ndsl.dsl.gt4py import FORWARD, PARALLEL, computation, interval
-from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ, Int, IntFieldIJ
+from ndsl.dsl.typing import (  # noqa: F401
+    Bool,
+    BoolField,
+    BoolFieldIJ,
+    Float,
+    FloatField,
+    FloatFieldIJ,
+    Int,
+    IntField,
+    IntFieldIJ,
+)
 from ndsl.stencils.basic_operations import copy
 from pyfv3.stencils.remap_profile import RemapProfile
 
@@ -79,6 +88,233 @@ def lagrangian_contributions(
         lev = lev - 1
 
 
+class LagrangianContribution:
+    """Lagrangian contribution as it appears in FV3GFS/SHiELD"""
+
+    def __init__(self, stencil_factory: StencilFactory, dims: Sequence[str]) -> None:
+        self._lagrangian_contributions = stencil_factory.from_dims_halo(
+            lagrangian_contributions,
+            compute_dims=dims,
+        )
+
+    def __call__(
+        self,
+        q: FloatField,
+        pe1: FloatField,
+        pe2: FloatField,
+        q4_1: FloatField,
+        q4_2: FloatField,
+        q4_3: FloatField,
+        q4_4: FloatField,
+        dp1: FloatField,
+        lev: IntFieldIJ,
+    ):
+        self._lagrangian_contributions(
+            q,
+            pe1,
+            pe2,
+            q4_1,
+            q4_2,
+            q4_3,
+            q4_4,
+            dp1,
+            lev,
+        )
+
+
+def lagrangian_contributions_interp(
+    km: int,
+    not_exit_loop: BoolFieldIJ,
+    INDEX_LM1: IntField,
+    INDEX_LP0: IntField,
+    q: FloatField,
+    pe1: FloatField,
+    pe2: FloatField,
+    q4_1: FloatField,
+    q4_2: FloatField,
+    q4_3: FloatField,
+    q4_4: FloatField,
+    dp1: FloatField,
+    lev: IntFieldIJ,
+):
+    """
+    Args:
+        km (in):
+        not_exit_loop (in/temp):
+        LM1 (in/temp):
+        LP0 (in/temp):
+        q (in/out):
+        pe1 (in):
+        pe2 (in):
+        q4_1 (in):
+        q4_2 (in):
+        q4_3 (in):
+        q4_4 (in):
+        dp1 (in):
+        lev (inout):
+    """
+
+    # This computation creates a IntField that allows for "absolute" references
+    # in the k-dimension for q and pe1.
+
+    # INDEX_LM1 and INDEX_LP0 is initialized such that if it's plugged into "q"
+    # (ex: q[0,0,INDEX_LM1]), the k level in q is "k = 0".
+
+    # For example, during the stencil computation at k = 2, INDEX_LM1[i,j,2] = -2
+    with computation(FORWARD):
+        with interval(0, 1):
+            INDEX_LM1 = 0
+            INDEX_LP0 = 0
+        with interval(1, None):
+            INDEX_LM1 = INDEX_LM1[0, 0, -1] - 1
+            INDEX_LP0 = INDEX_LP0[0, 0, -1] - 1
+
+    # TODO: Can we make lev a 2D temporary?
+    with computation(FORWARD), interval(...):
+        LM1 = 1
+        LP0 = 1
+        not_exit_loop = True
+        while LP0 <= km and not_exit_loop:
+            if pe1[0, 0, INDEX_LP0] < pe2:
+                LP0 = LP0 + 1
+                INDEX_LP0 = INDEX_LP0 + 1
+            else:
+                not_exit_loop = False
+
+        LM1 = max(LP0 - 1, 1)
+        INDEX_LM1 = INDEX_LM1 + (LM1 - 1)
+        LP0 = min(LP0, km)
+
+        if LP0 == 1:
+            INDEX_LP0 = INDEX_LM1
+        elif LP0 <= km:
+            INDEX_LP0 = INDEX_LM1 + 1
+        else:
+            INDEX_LP0 = INDEX_LM1
+
+        if LM1 == 1 and LP0 == 1:
+            q_temp = q[0, 0, INDEX_LM1] + (
+                q[0, 0, INDEX_LM1 + 1] - q[0, 0, INDEX_LM1]
+            ) * (pe2 - pe1[0, 0, INDEX_LM1]) / (
+                pe1[0, 0, INDEX_LM1 + 1] - pe1[0, 0, INDEX_LM1]
+            )
+
+        elif LM1 == km and LP0 == km:
+            q_temp = q[0, 0, INDEX_LM1] + (
+                q[0, 0, INDEX_LM1] - q[0, 0, INDEX_LM1 - 1]
+            ) * (pe2 - pe1[0, 0, INDEX_LM1]) / (
+                pe1[0, 0, INDEX_LM1] - pe1[0, 0, INDEX_LM1 - 1]
+            )
+
+        elif LM1 == 1 or LP0 == km:
+            q_temp = q[0, 0, INDEX_LP0] + (q[0, 0, INDEX_LM1] - q[0, 0, INDEX_LP0]) * (
+                pe2 - pe1[0, 0, INDEX_LP0]
+            ) / (pe1[0, 0, INDEX_LM1] - pe1[0, 0, INDEX_LP0])
+
+        else:
+            while pe2 < pe1[0, 0, lev] or pe2 > pe1[0, 0, lev + 1]:
+                lev = lev + 1
+            pl = (pe2 - pe1[0, 0, lev]) / dp1[0, 0, lev]
+            if pe2[0, 0, 1] <= pe1[0, 0, lev + 1]:
+                pr = (pe2[0, 0, 1] - pe1[0, 0, lev]) / dp1[0, 0, lev]
+                q_temp = (
+                    q4_2[0, 0, lev]
+                    + 0.5
+                    * (q4_4[0, 0, lev] + q4_3[0, 0, lev] - q4_2[0, 0, lev])
+                    * (pr + pl)
+                    - q4_4[0, 0, lev] * 1.0 / 3.0 * (pr * (pr + pl) + pl * pl)
+                )
+            else:
+                qsum = (pe1[0, 0, lev + 1] - pe2) * (
+                    q4_2[0, 0, lev]
+                    + 0.5
+                    * (q4_4[0, 0, lev] + q4_3[0, 0, lev] - q4_2[0, 0, lev])
+                    * (1.0 + pl)
+                    - q4_4[0, 0, lev] * 1.0 / 3.0 * (1.0 + pl * (1.0 + pl))
+                )
+                lev = lev + 1
+                while pe1[0, 0, lev + 1] < pe2[0, 0, 1]:
+                    qsum += dp1[0, 0, lev] * q4_1[0, 0, lev]
+                    lev = lev + 1
+                dp = pe2[0, 0, 1] - pe1[0, 0, lev]
+                esl = dp / dp1[0, 0, lev]
+                qsum += dp * (
+                    q4_2[0, 0, lev]
+                    + 0.5
+                    * esl
+                    * (
+                        q4_3[0, 0, lev]
+                        - q4_2[0, 0, lev]
+                        + q4_4[0, 0, lev] * (1.0 - (2.0 / 3.0) * esl)
+                    )
+                )
+                q_temp = qsum / (pe2[0, 0, 1] - pe2)
+
+        lev = lev - 1
+
+        q = q_temp
+
+
+class LagrangianContributionInterpolated:
+    """Lagrangian contribution as it appears in GEOS, modified from original
+    FV3GFS version"""
+
+    def __init__(
+        self,
+        stencil_factory: StencilFactory,
+        quantity_factory: QuantityFactory,
+        dims: Sequence[str],
+    ) -> None:
+        self._lagrangian_contributions_interp = stencil_factory.from_dims_halo(
+            lagrangian_contributions_interp,
+            compute_dims=dims,
+        )
+
+        self._INDEX_LM1 = quantity_factory.zeros(
+            [I_DIM, J_DIM, K_DIM],
+            units="",
+            dtype=Int,
+        )
+
+        self._INDEX_LP0 = quantity_factory.zeros(
+            [I_DIM, J_DIM, K_DIM],
+            units="",
+            dtype=Int,
+        )
+        self._km = stencil_factory.grid_indexing.domain[2]
+        self._not_exit_loop = quantity_factory.zeros(
+            [I_DIM, J_DIM], units="", dtype=bool
+        )
+
+    def __call__(
+        self,
+        q: FloatField,
+        pe1: FloatField,
+        pe2: FloatField,
+        q4_1: FloatField,
+        q4_2: FloatField,
+        q4_3: FloatField,
+        q4_4: FloatField,
+        dp1: FloatField,
+        lev: IntFieldIJ,
+    ):
+        self._lagrangian_contributions_interp(
+            km=self._km,
+            not_exit_loop=self._not_exit_loop,
+            INDEX_LM1=self._INDEX_LM1,
+            INDEX_LP0=self._INDEX_LP0,
+            q=q,
+            pe1=pe1,
+            pe2=pe2,
+            q4_1=q4_1,
+            q4_2=q4_2,
+            q4_3=q4_3,
+            q4_4=q4_4,
+            dp1=dp1,
+            lev=lev,
+        )
+
+
 class MapSingle(NDSLRuntime):
     """
     Fortran name is map_single, test classes are Map1_PPM_2d, Map_Scalar_2d
@@ -91,17 +327,10 @@ def __init__(
         kord: int,
         mode: int,
         dims: Sequence[str],
+        interpolate_contribution: bool = False,
     ) -> None:
         super().__init__(stencil_factory)
 
-        def make_quantity():
-            return quantity_factory.zeros(
-                [I_DIM, J_DIM, K_DIM],
-                units="unknown",
-                dtype=Float,
-            )
-
-        # All locals will be initialized in code before being read
         self._dp1 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
         self._q4_1 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
         self._q4_2 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
@@ -110,8 +339,9 @@ def make_quantity():
         self._lev = self.make_local(quantity_factory, [I_DIM, J_DIM], dtype=Int)
 
         # If the boundary condition is not given as an input, we use use a zero-reference
-        self._zero_qs = self.make_local(quantity_factory, [I_DIM, J_DIM])
-        self._zero_qs.data[:] = 0
+        # ⚠️ This _has_ to be a Quantity rather than a Local to be set to 0
+        self._zero_qs = quantity_factory.zeros([I_DIM, J_DIM], "")
+        self._zero_qs[:] = 0
 
         self._copy_stencil = stencil_factory.from_dims_halo(
             copy,
@@ -131,9 +361,10 @@ def make_quantity():
             dims=dims,
         )
 
-        self._lagrangian_contributions = stencil_factory.from_dims_halo(
-            lagrangian_contributions,
-            compute_dims=dims,
+        self._lagrangian_contributions = (
+            LagrangianContributionInterpolated(stencil_factory, quantity_factory, dims)
+            if interpolate_contribution
+            else LagrangianContribution(stencil_factory, dims)
         )
 
     def __call__(
@@ -166,7 +397,7 @@ def __call__(
                 self._q4_3,
                 self._q4_4,
                 self._dp1,
-                qmin,
+                Float(qmin),
             )
         else:
             self._remap_profile(
@@ -176,16 +407,17 @@ def __call__(
                 self._q4_3,
                 self._q4_4,
                 self._dp1,
-                qmin,
+                Float(qmin),
             )
+
         self._lagrangian_contributions(
-            q1,
-            pe1,
-            pe2,
-            self._q4_1,
-            self._q4_2,
-            self._q4_3,
-            self._q4_4,
-            self._dp1,
-            self._lev,
+            q=q1,
+            pe1=pe1,
+            pe2=pe2,
+            q4_1=self._q4_1,
+            q4_2=self._q4_2,
+            q4_3=self._q4_3,
+            q4_4=self._q4_4,
+            dp1=self._dp1,
+            lev=self._lev,
         )
diff --git a/pyfv3/stencils/mapn_tracer.py b/pyfv3/stencils/mapn_tracer.py
index bf2041ff..acaffa58 100644
--- a/pyfv3/stencils/mapn_tracer.py
+++ b/pyfv3/stencils/mapn_tracer.py
@@ -1,11 +1,9 @@
-import dace
-
-import ndsl.dsl.gt4py_utils as utils
 from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_DIM
 from ndsl.dsl.typing import FloatField
 from pyfv3.stencils.fillz import FillNegativeTracerValues
 from pyfv3.stencils.map_single import MapSingle
+from pyfv3.tracers import FVTracers
 
 
 class MapNTracer(NDSLRuntime):
@@ -18,11 +16,10 @@ def __init__(
         stencil_factory: StencilFactory,
         quantity_factory: QuantityFactory,
         kord: int,
-        nq: int,
         fill: bool,
     ):
         super().__init__(stencil_factory)
-        self._nq = int(nq)
+        self._nq = FVTracers.size(0)
 
         self._map_single_parametrized_kord = MapSingle(
             stencil_factory,
@@ -43,21 +40,22 @@ def __init__(
         if fill:
             self._fill_negative_tracers = True
             self._fillz = FillNegativeTracerValues(
-                stencil_factory,
-                quantity_factory,
-                self._nq,
+                stencil_factory, quantity_factory, self._nq
             )
         else:
             self._fill_negative_tracers = False
 
-        self._index_graupel = utils.tracer_variables.index("qgraupel")
+        if self._nq > 6:
+            self._index_cloud = FVTracers.index("cloud")
+        else:
+            self._index_cloud = self._nq + 1
 
     def __call__(
         self,
         pe1: FloatField,
         pe2: FloatField,
         dp2: FloatField,
-        tracers: dace.compiletime,  # dict[str, Quantity]
+        tracers: FVTracers,
     ):
         """
         Remaps the tracer species onto the Eulerian grid
@@ -70,11 +68,11 @@ def __call__(
             dp2 (in): Difference in pressure between Eulerian levels
             tracers (inout): tracers to be remapped
         """
-        for i, q in enumerate(tracers.keys()):
-            if i != self._index_graupel:
-                self._map_single_parametrized_kord(tracers[q], pe1, pe2)
-
-        self._map_single_kord9(tracers["qgraupel"], pe1, pe2)
+        for i_tracer in range(0, self._nq):
+            if i_tracer == self._index_cloud:
+                self._map_single_kord9(tracers[:, :, :, i_tracer], pe1, pe2)
+            else:
+                self._map_single_parametrized_kord(tracers[:, :, :, i_tracer], pe1, pe2)
 
         if self._fill_negative_tracers:
             self._fillz(dp2, tracers)
diff --git a/pyfv3/stencils/moist_cv.py b/pyfv3/stencils/moist_cv.py
index 1d08c3af..4ba28050 100644
--- a/pyfv3/stencils/moist_cv.py
+++ b/pyfv3/stencils/moist_cv.py
@@ -1,8 +1,9 @@
 import ndsl.constants as constants
-from ndsl.dsl.gt4py import PARALLEL, computation, exp
+from ndsl.dsl.gt4py import BACKWARD, FORWARD, PARALLEL, computation, exp
 from ndsl.dsl.gt4py import function as gtfunction
 from ndsl.dsl.gt4py import interval, log
-from ndsl.dsl.typing import Float, FloatField
+from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
+from pyfv3.tracers import FVTracers
 
 
 from gt4py.cartesian.gtscript import __INLINED  # isort:skip
@@ -16,15 +17,23 @@ def set_cappa(qvapor, cvm, r_vir):
 
 @gtfunction
 def moist_cvm(qvapor, gz, ql, qs):
+    # CK : GEOS applies the "max" function to tracer values
     cvm = (
-        (1.0 - (qvapor + gz)) * constants.CV_AIR
-        + qvapor * constants.CV_VAP
+        (1.0 - (max(qvapor, 0.0) + gz)) * constants.CV_AIR
+        + max(qvapor, 0.0) * constants.CV_VAP
         + ql * constants.C_LIQ
         + qs * constants.C_ICE
     )
     return cvm
 
 
+@gtfunction
+def moist_cv_nwat0_fn():
+    gz = 0
+    cvm = constants.CV_AIR
+    return cvm, gz
+
+
 @gtfunction
 def moist_cv_nwat6_fn(
     qvapor: FloatField,
@@ -34,15 +43,16 @@ def moist_cv_nwat6_fn(
     qice: FloatField,
     qgraupel: FloatField,
 ):
-    ql = qliquid + qrain
-    qs = qice + qsnow + qgraupel
+    # CK : GEOS applies the "max" function to tracer values
+    ql = max(qliquid, 0.0) + max(qrain, 0.0)
+    qs = max(qice, 0.0) + max(qsnow, 0.0) + max(qgraupel, 0.0)
     gz = ql + qs
     cvm = moist_cvm(qvapor, gz, ql, qs)
     return cvm, gz
 
 
 @gtfunction
-def moist_pt_func(
+def moist_pt_func_nwat6(
     qvapor: FloatField,
     qliquid: FloatField,
     qrain: FloatField,
@@ -65,6 +75,23 @@ def moist_pt_func(
     return cvm, gz, q_con, cappa, pt
 
 
+@gtfunction
+def moist_pt_func_nwat0(
+    qvapor: FloatField,
+    q_con: FloatField,
+    pt: FloatField,
+    cappa: FloatField,
+    delp: FloatField,
+    delz: FloatField,
+    r_vir: Float,
+):
+    cvm, gz = moist_cv_nwat0_fn()
+    q_con = gz
+    cappa = set_cappa(qvapor, cvm, r_vir)
+    pt = pt * exp(cappa / (1.0 - cappa) * log(constants.RDG * delp / delz * pt))
+    return cvm, gz, q_con, cappa, pt
+
+
 @gtfunction
 def last_pt(
     pt: FloatField,
@@ -78,13 +105,7 @@ def last_pt(
 
 
 def moist_pt_last_step(
-    qvapor: FloatField,
-    qliquid: FloatField,
-    qrain: FloatField,
-    qsnow: FloatField,
-    qice: FloatField,
-    qgraupel: FloatField,
-    gz: FloatField,
+    tracers: FVTracers,
     pt: FloatField,
     pkz: FloatField,
     dtmp: Float,
@@ -98,23 +119,26 @@ def moist_pt_last_step(
         qsnow (in):
         qice (in):
         qgraupel (in):
-        gz (out):
         pt (inout):
         pkz (in):
         dtmp (in):
         r_vir (in):
     """
+    from __externals__ import i_graupel, i_ice, i_liquid, i_rain, i_snow, i_vapor, nwat
+
     with computation(PARALLEL), interval(...):
-        # if nwat == 2:
-        #    gz = qliquid if qliquid > 0. else 0.
-        #    qv = qvapor if qvapor > 0. else 0.
-        #    pt = last_pt(pt, dtmp, pkz, gz, qv, r_vir)
-        # elif nwat == 6:
-        gz = qliquid + qrain + qice + qsnow + qgraupel
-        pt = last_pt(pt, dtmp, pkz, gz, qvapor, r_vir)
-        # else:
-        #    cvm, gz = moist_cv_nwat6_fn(qvapor, qliquid, qrain, qsnow, qice, qgraupel)
-        #    pt = last_pt(pt, dtmp, pkz, gz, qvapor, zvir)
+        if __INLINED(nwat == 0):
+            _cvm, gz = moist_cv_nwat0_fn()
+        elif __INLINED(nwat == 6):
+            _cvm, gz = moist_cv_nwat6_fn(
+                tracers.A[i_vapor],
+                tracers.A[i_liquid],
+                tracers.A[i_rain],
+                tracers.A[i_snow],
+                tracers.A[i_ice],
+                tracers.A[i_graupel],
+            )
+        pt = last_pt(pt, dtmp, pkz, gz, tracers.A[i_vapor], r_vir)
 
 
 @gtfunction
@@ -124,15 +148,7 @@ def compute_pkz_func(delp, delz, pt, cappa):
 
 
 def moist_pkz(
-    qvapor: FloatField,
-    qliquid: FloatField,
-    qrain: FloatField,
-    qsnow: FloatField,
-    qice: FloatField,
-    qgraupel: FloatField,
-    q_con: FloatField,
-    gz: FloatField,
-    cvm: FloatField,
+    tracers: FVTracers,
     pkz: FloatField,
     pt: FloatField,
     cappa: FloatField,
@@ -148,9 +164,6 @@ def moist_pkz(
         qsnow (in):
         qice (in):
         qgraupel (in):
-        q_con (out):
-        gz (out):
-        cvm (out):
         pkz (out):
         pt (in):
         cappa (out):
@@ -158,23 +171,138 @@ def moist_pkz(
         delz (in):
         r_vir (in):
     """
+    from __externals__ import i_graupel, i_ice, i_liquid, i_rain, i_snow, i_vapor, nwat
+
     # TODO: What is happening with q_con and gz here?
     with computation(PARALLEL), interval(...):
-        cvm, gz = moist_cv_nwat6_fn(
-            qvapor, qliquid, qrain, qsnow, qice, qgraupel
-        )  # if (nwat == 6) else moist_cv_default_fn(constants.CV_AIR)
-        q_con[0, 0, 0] = gz
-        cappa = set_cappa(qvapor, cvm, r_vir)
+        if __INLINED(nwat == 0):
+            cvm, _gz = moist_cv_nwat0_fn()
+        elif __INLINED(nwat == 6):
+            cvm, _gz = moist_cv_nwat6_fn(
+                tracers.A[i_vapor],
+                tracers.A[i_liquid],
+                tracers.A[i_rain],
+                tracers.A[i_snow],
+                tracers.A[i_ice],
+                tracers.A[i_graupel],
+            )
+
+        cappa = set_cappa(tracers.A[i_vapor], cvm, r_vir)
         pkz = compute_pkz_func(delp, delz, pt, cappa)
 
 
+def moist_te(
+    tracers: FVTracers,
+    u: FloatField,
+    v: FloatField,
+    w: FloatField,
+    te: FloatFieldIJ,
+    pt: FloatField,
+    phis: FloatField,
+    delp: FloatField,
+    rsin2: FloatFieldIJ,
+    cosa_s: FloatFieldIJ,
+    hs: FloatFieldIJ,
+    delz: FloatField,
+    grav: Float,
+):
+    """
+    Args:
+        tracers (in):
+        u (in):
+        v (in):
+        w (in):
+        te (out):
+        pt (in):
+        phis (in):
+        delp (in):
+        rsin2 (in):
+        cosa_s (in):
+        hs (in):
+    """
+    from __externals__ import i_graupel, i_ice, i_liquid, i_rain, i_snow, i_vapor, nwat
+
+    with computation(FORWARD), interval(-1, None):
+        te = 0.0
+        phis = hs
+    with computation(BACKWARD), interval(0, -1):
+        phis = phis[0, 0, 1] - grav * delz
+    with computation(FORWARD), interval(0, -1):
+        if __INLINED(nwat == 0):
+            cvm, _gz = moist_cv_nwat0_fn()
+        elif __INLINED(nwat == 6):
+            cvm, _gz = moist_cv_nwat6_fn(
+                tracers.A[i_vapor],
+                tracers.A[i_liquid],
+                tracers.A[i_rain],
+                tracers.A[i_snow],
+                tracers.A[i_ice],
+                tracers.A[i_graupel],
+            )
+
+        te = te + delp * (
+            cvm * pt
+            + 0.5
+            * (
+                phis
+                + phis[0, 0, 1]
+                + w**2.0
+                + 0.5
+                * rsin2
+                * (
+                    u**2.0
+                    + u[0, 1, 0] ** 2.0
+                    + v**2.0
+                    + v[1, 0, 0] ** 2.0
+                    - (u + u[0, 1, 0]) * (v + v[1, 0, 0]) * cosa_s
+                )
+            )
+        )
+
+
+def te_zsum(
+    te_2d: FloatFieldIJ,
+    te0_2d: FloatFieldIJ,
+    delp: FloatField,
+    pkz: FloatField,
+    zsum1: FloatFieldIJ,
+):
+    with computation(FORWARD):
+        with interval(0, 1):
+            te_2d = te0_2d - te_2d
+            zsum1 = pkz * delp
+
+        with interval(1, None):
+            zsum1 = zsum1 + pkz * delp
+
+
+def cond_output(
+    q_con: FloatField,
+    tracers: FVTracers,
+):
+    from __externals__ import i_graupel, i_ice, i_liquid, i_rain, i_snow
+
+    with computation(PARALLEL), interval(...):
+        q_con = 0.0
+        if __INLINED(i_liquid > 0):
+            if tracers.A[i_liquid] > 0.0:
+                q_con = q_con + tracers.A[i_liquid]
+        if __INLINED(i_ice > 0):
+            if tracers.A[i_ice] > 0.0:
+                q_con = q_con + tracers.A[i_ice]
+        if __INLINED(i_rain > 0):
+            if tracers.A[i_rain] > 0.0:
+                q_con = q_con + tracers.A[i_rain]
+        if __INLINED(i_snow > 0):
+            if tracers.A[i_snow] > 0.0:
+                q_con = q_con + tracers.A[i_snow]
+        if __INLINED(i_graupel > 0):
+            if tracers.A[i_graupel] > 0.0:
+                q_con = q_con + tracers.A[i_graupel]
+
+
 def fv_setup(
-    qvapor: FloatField,
-    qliquid: FloatField,
-    qrain: FloatField,
-    qsnow: FloatField,
-    qice: FloatField,
-    qgraupel: FloatField,
+    tracers: FVTracers,
     q_con: FloatField,
     cvm: FloatField,
     pkz: FloatField,
@@ -207,13 +335,30 @@ def fv_setup(
 
     # TODO: what is being set up here, and how? update docstring
     with computation(PARALLEL), interval(...):
-        from __externals__ import moist_phys
+        from __externals__ import (
+            i_graupel,
+            i_ice,
+            i_liquid,
+            i_rain,
+            i_snow,
+            i_vapor,
+            moist_phys,
+            nwat,
+        )
 
         if __INLINED(moist_phys):
-            cvm, q_con = moist_cv_nwat6_fn(
-                qvapor, qliquid, qrain, qsnow, qice, qgraupel
-            )  # if (nwat == 6) else moist_cv_default_fn(constants.CV_AIR)
-            dp1 = constants.ZVIR * qvapor
+            if __INLINED(nwat == 0):
+                cvm, q_con = moist_cv_nwat0_fn()
+            elif __INLINED(nwat == 6):
+                cvm, q_con = moist_cv_nwat6_fn(
+                    tracers.A[i_vapor],
+                    tracers.A[i_liquid],
+                    tracers.A[i_rain],
+                    tracers.A[i_snow],
+                    tracers.A[i_ice],
+                    tracers.A[i_graupel],
+                )  # if (nwat == 6) else moist_cv_default_fn(constants.CV_AIR)
+            dp1 = constants.ZVIR * tracers.A[i_vapor]
             cappa = constants.RDGAS / (constants.RDGAS + cvm / (1.0 + dp1))
             pkz = exp(
                 cappa
diff --git a/pyfv3/stencils/neg_adj3.py b/pyfv3/stencils/neg_adj3.py
index 066ef953..1fa23a85 100644
--- a/pyfv3/stencils/neg_adj3.py
+++ b/pyfv3/stencils/neg_adj3.py
@@ -1,5 +1,5 @@
 import ndsl.constants as constants
-from ndsl import QuantityFactory, StencilFactory
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM
 from ndsl.dsl.gt4py import BACKWARD, FORWARD, PARALLEL, computation
 from ndsl.dsl.gt4py import function as gtfunction
@@ -134,18 +134,16 @@ def fix_neg_water(
         # no GFS_PHYS compiler flag -- additional saturation adjustment calculations!
 
 
-def fillq(q: FloatField, dp: FloatField, sum1: FloatFieldIJ, sum2: FloatFieldIJ):
+def fillq(q: FloatField, dp: FloatField):
     """
     Args:
-        q (inout):
+        q (inout): Tracers
         dp (in):
-        sum1 (out):
-        sum2 (out):
     """
-    with computation(FORWARD), interval(...):
+    with computation(FORWARD), interval(0, 1):
         # reset accumulating fields
-        sum1 = 0.0
-        sum2 = 0.0
+        sum1: FloatFieldIJ = 0.0
+        sum2: FloatFieldIJ = 0.0
     with computation(FORWARD), interval(...):
         if q > 0:
             sum1 = sum1 + q * dp
@@ -312,7 +310,7 @@ def fix_water_vapor_k_loop(i, j, kbot, qvapor, dp):
 """
 
 
-class AdjustNegativeTracerMixingRatio:
+class AdjustNegativeTracerMixingRatio(NDSLRuntime):
     """Adjust tracer mixing ratios to fix negative values
 
     Named neg_adj3 in fortran
@@ -338,17 +336,9 @@ def __init__(
         check_negative: bool,
         hydrostatic: bool,
     ):
+        super().__init__(stencil_factory)
+
         grid_indexing = stencil_factory.grid_indexing
-        self._sum1 = quantity_factory.zeros(
-            [I_DIM, J_DIM],
-            units="unknown",
-            dtype=Float,
-        )
-        self._sum2 = quantity_factory.zeros(
-            [I_DIM, J_DIM],
-            units="unknown",
-            dtype=Float,
-        )
         if check_negative:
             raise NotImplementedError(
                 "Unimplemented namelist value check_negative=True"
@@ -362,6 +352,16 @@ def __init__(
             self._d0_vap = constants.CV_VAP - constants.C_LIQ
         self._lv00 = constants.HLV - self._d0_vap * constants.TICE
 
+        self._sum1 = quantity_factory.zeros(
+            [I_DIM, J_DIM],
+            units="unknown",
+            dtype=Float,
+        )
+        self._sum2 = quantity_factory.zeros(
+            [I_DIM, J_DIM],
+            units="unknown",
+            dtype=Float,
+        )
         self._fix_neg_water = stencil_factory.from_origin_domain(
             func=fix_neg_water,
             origin=grid_indexing.origin_compute(),
diff --git a/pyfv3/stencils/nh_p_grad.py b/pyfv3/stencils/nh_p_grad.py
index 36f8b050..c3294f2f 100644
--- a/pyfv3/stencils/nh_p_grad.py
+++ b/pyfv3/stencils/nh_p_grad.py
@@ -1,6 +1,6 @@
 import numpy as np
 
-from ndsl import QuantityFactory, StencilFactory, orchestrate
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_INTERFACE_DIM
 from ndsl.dsl.gt4py import PARALLEL, computation, interval
 from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
@@ -112,7 +112,7 @@ def calc_v(
         ) * rdy
 
 
-class NonHydrostaticPressureGradient:
+class NonHydrostaticPressureGradient(NDSLRuntime):
     """
     Apply nonhydrostatic pressure gradient force in the horizontal.
 
@@ -131,10 +131,7 @@ def __init__(
         grid_type: int,
         use_logp: bool,
     ):
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-        )
+        super().__init__(stencil_factory)
 
         grid_indexing = stencil_factory.grid_indexing
         self.orig = grid_indexing.origin_compute()
@@ -152,15 +149,11 @@ def __init__(
                 "Non Hydrostatic Pressure Gradient (nh_p_grad) with `use_logp` is not implemented."
             )
 
-        self._tmp_wk = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_INTERFACE_DIM],
-            units="unknown",
-            dtype=Float,
+        self._tmp_wk = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM]
         )
-        self._tmp_wk1 = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_INTERFACE_DIM],
-            units="unknown",
-            dtype=Float,
+        self._tmp_wk1 = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM]
         )
 
         self.a2b_k1 = AGrid2BGridFourthOrder(
@@ -171,7 +164,7 @@ def __init__(
             z_dim=K_INTERFACE_DIM,
             replace=True,
         )
-        self.a2b_kbuffer = AGrid2BGridFourthOrder(
+        self.a2b_kinterface = AGrid2BGridFourthOrder(
             stencil_factory,
             quantity_factory=quantity_factory,
             grid_data=grid_data,
@@ -244,7 +237,7 @@ def __call__(
         self.a2b_k1(pp, self._tmp_wk1)
         self.a2b_k1(pk3, self._tmp_wk1)
 
-        self.a2b_kbuffer(gz, self._tmp_wk1)
+        self.a2b_kinterface(gz, self._tmp_wk1)
         self.a2b_kstandard(delp, self._tmp_wk1)
 
         self._set_k0_and_calc_wk_stencil(pp, pk3, self._tmp_wk, top_value)
diff --git a/pyfv3/stencils/pk3_halo.py b/pyfv3/stencils/pk3_halo.py
index 889febd6..b2380457 100644
--- a/pyfv3/stencils/pk3_halo.py
+++ b/pyfv3/stencils/pk3_halo.py
@@ -1,14 +1,11 @@
-from ndsl import QuantityFactory, StencilFactory
-from ndsl.constants import I_DIM, J_DIM
-from ndsl.dsl.gt4py import FORWARD, computation, horizontal, interval, region
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
+from ndsl.dsl.gt4py import FORWARD, computation, exp, horizontal, interval, log, region
 from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
 
 
 # TODO merge with pe_halo? reuse partials?
-# NOTE: This is different from pyFV3.stencils.pe_halo.edge_pe
-def edge_pe_update(
-    pe: FloatFieldIJ, delp: FloatField, pk3: FloatField, ptop: Float, akap: Float
-):
+# NOTE: This is different from pyfv3.stencils.pe_halo.edge_pe
+def edge_pe_update(delp: FloatField, pk3: FloatField, ptop: Float, akap: Float):
     from __externals__ import local_ie, local_is, local_je, local_js
 
     with computation(FORWARD):
@@ -19,7 +16,8 @@ def edge_pe_update(
                 region[local_is - 2 : local_ie + 3, local_js - 2 : local_js],
                 region[local_is - 2 : local_ie + 3, local_je + 1 : local_je + 3],
             ):
-                pe = ptop
+                pe: FloatFieldIJ = ptop
+
         with interval(1, None):
             with horizontal(
                 region[local_is - 2 : local_is, local_js : local_je + 1],
@@ -27,11 +25,12 @@ def edge_pe_update(
                 region[local_is - 2 : local_ie + 3, local_js - 2 : local_js],
                 region[local_is - 2 : local_ie + 3, local_je + 1 : local_je + 3],
             ):
+
                 pe = pe + delp[0, 0, -1]
-                pk3 = pe**akap
+                pk3 = exp(akap * log(pe))
 
 
-class PK3Halo:
+class PK3Halo(NDSLRuntime):
     """
     Fortran name is pk3_halo
     """
@@ -41,6 +40,8 @@ def __init__(
         stencil_factory: StencilFactory,
         quantity_factory: QuantityFactory,
     ):
+        super().__init__(stencil_factory)
+
         grid_indexing = stencil_factory.grid_indexing
         origin = grid_indexing.origin_full()
         domain = grid_indexing.domain_full(add=(0, 0, 1))
@@ -53,11 +54,6 @@ def __init__(
             origin=origin,
             domain=domain,
         )
-        self._pe_tmp = quantity_factory.zeros(
-            [I_DIM, J_DIM],
-            units="unknown",
-            dtype=Float,
-        )
 
     def __call__(self, pk3: FloatField, delp: FloatField, ptop: Float, akap: Float):
         """Update pressure raised to the kappa (pk3) in halo region.
@@ -68,4 +64,4 @@ def __call__(self, pk3: FloatField, delp: FloatField, ptop: Float, akap: Float):
             ptop: The pressure level at the top of atmosphere
             akap: Poisson constant (KAPPA)
         """
-        self._edge_pe_update(self._pe_tmp, delp, pk3, ptop, akap)
+        self._edge_pe_update(delp, pk3, ptop, akap)
diff --git a/pyfv3/stencils/ray_fast.py b/pyfv3/stencils/ray_fast.py
index 938b991b..faf9d064 100644
--- a/pyfv3/stencils/ray_fast.py
+++ b/pyfv3/stencils/ray_fast.py
@@ -1,7 +1,17 @@
+import numpy as np
+
 import ndsl.constants as constants
-from ndsl import StencilFactory, orchestrate
-from ndsl.constants import I_INTERFACE_DIM, J_INTERFACE_DIM, K_DIM
-from ndsl.dsl.gt4py import BACKWARD, FORWARD, PARALLEL, computation
+from ndsl import NDSLRuntime, StencilFactory
+from ndsl.boilerplate import get_factories_single_tile
+from ndsl.constants import (
+    I_DIM,
+    I_INTERFACE_DIM,
+    J_DIM,
+    J_INTERFACE_DIM,
+    K_DIM,
+    SECONDS_PER_DAY,
+)
+from ndsl.dsl.gt4py import BACKWARD, FORWARD, PARALLEL, computation, float64
 from ndsl.dsl.gt4py import function as gtfunction
 from ndsl.dsl.gt4py import horizontal, interval, log, region, sin
 from ndsl.dsl.typing import Float, FloatField, FloatFieldK
@@ -27,7 +37,7 @@ def compute_rf_vals(pfull, bdt, rf_cutoff, tau0, ptop):
 @gtfunction
 def compute_rff_vals(pfull, dt, rf_cutoff, tau0, ptop):
     rffvals = compute_rf_vals(pfull, dt, rf_cutoff, tau0, ptop)
-    rffvals = 1.0 / (1.0 + rffvals)
+    rffvals = float64(1.0) / (float64(1.0) + rffvals)
     return rffvals
 
 
@@ -36,14 +46,31 @@ def dm_layer(rf, dp, wind):
     return (1.0 - rf) * dp * wind
 
 
+def ray_fast_damping_increment(
+    pfull: FloatFieldK,
+    dt: Float,
+    ptop: Float,
+    rf: FloatField,
+):
+    """rf is rayleigh damping increment, fraction of vertical velocity
+    left after doing rayleigh damping (w -> w * rf)
+    """
+    from __externals__ import rf_cutoff, tau
+
+    with computation(PARALLEL), interval(...):
+        if pfull < rf_cutoff:
+            # rf is rayleigh damping increment, fraction of vertical velocity
+            # left after doing rayleigh damping (w -> w * rf)
+            rf = compute_rff_vals(pfull, dt, rf_cutoff, tau * SECONDS_PER_DAY, ptop)
+
+
 def ray_fast_wind_compute(
     u: FloatField,
     v: FloatField,
     w: FloatField,
     delta_p_ref: FloatFieldK,  # reference delta pressure
     pfull: FloatFieldK,  # input layer pressure reference?
-    dt: Float,
-    ptop: Float,
+    rf: FloatFieldK,
     rf_cutoff_nudge: Float,
 ):
     """
@@ -58,16 +85,9 @@ def ray_fast_wind_compute(
         rf_cutoff_nudge (in):
         ks (in):
     """
-    from __externals__ import hydrostatic, local_ie, local_je, rf_cutoff, tau
+    from __externals__ import hydrostatic, local_ie, local_je, rf_cutoff
 
     # dm_stencil
-    with computation(PARALLEL), interval(...):
-        # TODO -- in the fortran model rf is only computed once, repeating
-        # the computation every time ray_fast is run is inefficient
-        if pfull < rf_cutoff:
-            # rf is rayleigh damping increment, fraction of vertical velocity
-            # left after doing rayleigh damping (w -> w * rf)
-            rf = compute_rff_vals(pfull, dt, rf_cutoff, tau * SDAY, ptop)
     with computation(FORWARD):
         with interval(0, 1):
             if pfull < rf_cutoff_nudge:
@@ -132,7 +152,7 @@ def ray_fast_wind_compute(
                     w *= rf
 
 
-class RayleighDamping:
+class RayleighDamping(NDSLRuntime):
     """
     Apply Rayleigh damping (for tau > 0).
 
@@ -146,14 +166,26 @@ class RayleighDamping:
     Fortran name: ray_fast.
     """
 
-    def __init__(self, stencil_factory: StencilFactory, rf_cutoff, tau, hydrostatic):
-        orchestrate(obj=self, config=stencil_factory.config.dace_config)
+    def __init__(
+        self,
+        stencil_factory: StencilFactory,
+        rf_cutoff: Float,
+        tau: Float,
+        hydrostatic: bool,
+    ):
+        super().__init__(stencil_factory)
+
         grid_indexing = stencil_factory.grid_indexing
-        self._rf_cutoff = rf_cutoff
+        self._rf_cutoff = Float(rf_cutoff)
         origin, domain = grid_indexing.get_origin_domain(
             [I_INTERFACE_DIM, J_INTERFACE_DIM, K_DIM]
         )
 
+        if tau == 0:
+            raise NotImplementedError(
+                "Dynamical Core (fv_dynamics): RayleighDamping, with tau <= 0, is not implemented"
+            )
+
         ax_offsets = grid_indexing.axis_offsets(origin, domain)
         local_axis_offsets = {}
         for axis_offset_name, axis_offset_value in ax_offsets.items():
@@ -166,12 +198,35 @@ def __init__(self, stencil_factory: StencilFactory, rf_cutoff, tau, hydrostatic)
             domain=domain,
             externals={
                 "hydrostatic": hydrostatic,
-                "rf_cutoff": rf_cutoff,
+                "rf_cutoff": self._rf_cutoff,
                 "tau": tau,
                 **local_axis_offsets,
             },
         )
 
+        # We compute the damping increment once using a trick to write a
+        # FloatFieldK as a (1, 1, K) 3D writable Field
+        _K_stencil_factory, K_quantity_factory = get_factories_single_tile(
+            1,
+            1,
+            domain[2],
+            0,
+            stencil_factory.backend,
+        )
+        self._ray_fast_damping_increment = stencil_factory.from_origin_domain(
+            ray_fast_damping_increment,
+            origin=(0, 0, origin[2]),
+            domain=(1, 1, domain[2]),
+            externals={
+                "rf_cutoff": self._rf_cutoff,
+                "tau": tau,
+            },
+        )
+        self._damping_increment = K_quantity_factory.ones(
+            [I_DIM, J_DIM, K_DIM], units="n/a"
+        )
+        self._initialize_damping_increment = np.ones((1,), dtype=int)
+
     def __call__(
         self,
         u: FloatField,
@@ -182,15 +237,31 @@ def __call__(
         dt: Float,
         ptop: Float,
     ):
-        rf_cutoff_nudge = self._rf_cutoff + min(100.0, 10.0 * ptop)
+        """
+        Args:
+            u (inout)
+            v (inout)
+            w (inout)
+            dp (in)
+            pfull (in)
+            dt (in)
+            ptop (in)
+        """
+        rf_cutoff_nudge = self._rf_cutoff + min(Float(100.0), Float(10.0) * ptop)
 
+        # TODO: this is a bad fix to go around an orchestration issue
+        #       on compile-time values. Do better.
+        if self._initialize_damping_increment[0] == 1:
+            self._ray_fast_damping_increment(
+                pfull=pfull, dt=dt, ptop=ptop, rf=self._damping_increment
+            )
+            self._initialize_damping_increment[0] = 0
         self._ray_fast_wind_compute(
-            u,
-            v,
-            w,
-            dp,
-            pfull,
-            dt,
-            ptop,
-            rf_cutoff_nudge,
+            u=u,
+            v=v,
+            w=w,
+            delta_p_ref=dp,
+            pfull=pfull,
+            rf=self._damping_increment[0, 0, :],
+            rf_cutoff_nudge=rf_cutoff_nudge,
         )
diff --git a/pyfv3/stencils/remap_profile.py b/pyfv3/stencils/remap_profile.py
index f813273c..1a81dc70 100644
--- a/pyfv3/stencils/remap_profile.py
+++ b/pyfv3/stencils/remap_profile.py
@@ -657,5 +657,5 @@ def __call__(
                 self._ext5,
                 self._ext6,
                 self._extm,
-                qmin,
+                Float(qmin),
             )
diff --git a/pyfv3/stencils/remapping.py b/pyfv3/stencils/remapping.py
index c42196cb..7534c959 100644
--- a/pyfv3/stencils/remapping.py
+++ b/pyfv3/stencils/remapping.py
@@ -1,5 +1,3 @@
-import dace
-
 from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import (
     I_DIM,
@@ -26,8 +24,13 @@
 from pyfv3.stencils import moist_cv
 from pyfv3.stencils.map_single import MapSingle
 from pyfv3.stencils.mapn_tracer import MapNTracer
-from pyfv3.stencils.moist_cv import moist_pt_func, moist_pt_last_step
+from pyfv3.stencils.moist_cv import (
+    moist_pt_func_nwat0,
+    moist_pt_func_nwat6,
+    moist_pt_last_step,
+)
 from pyfv3.stencils.saturation_adjustment import SatAdjust3d
+from pyfv3.tracers import FVTracers
 
 
 from gt4py.cartesian.gtscript import __INLINED  # isort:skip
@@ -80,12 +83,7 @@ def undo_delz_adjust_and_copy_peln(
 # TODO: some of the intermediate values here are not really output
 # values, and can be refactored into stencil temporaries (e.g. cvm)
 def moist_cv_pt_pressure(
-    qvapor: FloatField,
-    qliquid: FloatField,
-    qrain: FloatField,
-    qsnow: FloatField,
-    qice: FloatField,
-    qgraupel: FloatField,
+    tracers: FVTracers,
     q_con: FloatField,
     pt: FloatField,
     cappa: FloatField,
@@ -97,8 +95,10 @@ def moist_cv_pt_pressure(
     bk: FloatFieldK,
     dp2: FloatField,
     ps: FloatFieldIJ,
+    pn1: FloatField,
     pn2: FloatField,
     peln: FloatField,
+    remap_t: bool,
     r_vir: Float,
 ):
     """
@@ -124,29 +124,41 @@ def moist_cv_pt_pressure(
         ps (out):
         pn2 (out):
         peln (in):
+        remap_t (in):
+        r_vir (in):
     """
-    from __externals__ import hydrostatic, kord_tm
+
+    from __externals__ import i_graupel, i_ice, i_liquid, i_rain, i_snow, i_vapor, nwat
 
     # moist_cv.moist_pt
     with computation(PARALLEL), interval(0, -1):
-        if __INLINED(kord_tm < 0):
-            cvm, gz, q_con, cappa, pt = moist_pt_func(
-                qvapor,
-                qliquid,
-                qrain,
-                qsnow,
-                qice,
-                qgraupel,
-                q_con,
-                pt,
-                cappa,
-                delp,
-                delz,
-                r_vir,
-            )
-        # delz_adjust
-        if __INLINED(not hydrostatic):
-            delz = -delz / delp
+        if remap_t:
+            if __INLINED(nwat == 0):
+                cvm, gz, q_con, cappa, pt = moist_pt_func_nwat0(
+                    tracers.A[i_vapor],
+                    q_con,
+                    pt,
+                    cappa,
+                    delp,
+                    delz,
+                    r_vir,
+                )
+            elif __INLINED(nwat == 6):
+                cvm, gz, q_con, cappa, pt = moist_pt_func_nwat6(
+                    tracers.A[i_vapor],
+                    tracers.A[i_liquid],
+                    tracers.A[i_rain],
+                    tracers.A[i_ice],
+                    tracers.A[i_snow],
+                    tracers.A[i_graupel],
+                    q_con,
+                    pt,
+                    cappa,
+                    delp,
+                    delz,
+                    r_vir,
+                )
+
     # pressure_updates
     with computation(FORWARD):
         with interval(-1, None):
@@ -154,23 +166,23 @@ def moist_cv_pt_pressure(
     with computation(PARALLEL):
         with interval(0, 1):
             pn2 = peln
+            pn1 = peln
         # TODO: refactor the pe2 = ptop assignment from
         # previous stencil into this one, and remove
         # pe2 from the other stencil
         with interval(1, -1):
             pe2 = ak + bk * ps
+            pn1 = peln
         with interval(-1, None):
             pn2 = peln
+            pn1 = peln
     with computation(BACKWARD), interval(0, -1):
         dp2 = pe2[0, 0, 1] - pe2
-    # copy_stencil
-    with computation(PARALLEL), interval(0, -1):
-        delp = dp2
 
 
 def pn2_pk_delp(
-    dp2: FloatField,
-    delp: FloatField,
+    # dp2: FloatField,
+    # delp: FloatField,
     pe2: FloatField,
     pn2: FloatField,
     pk: FloatField,
@@ -185,18 +197,26 @@ def pn2_pk_delp(
         pk (out):
     """
     with computation(PARALLEL), interval(...):
-        delp = dp2
+        # NOTE : GEOS doesn't perform the delp calcuation at this location
+        #        Also, in moist_cv_pt_pressure, the below calculation is also done
+        # delp = dp2
         pn2 = log(pe2)
         pk = exp(akap * pn2)
 
 
+def pe0_ptop_xmax(pe0: FloatField, ptop: Float):
+    with computation(PARALLEL), interval(0, 1):
+        pe0 = ptop
+
+
 def pressures_mapu(
     pe: FloatField,
-    pe1: FloatField,
+    # pe1: FloatField,
     ak: FloatFieldK,
     bk: FloatFieldK,
     pe0: FloatField,
     pe3: FloatField,
+    ptop: Float,
 ):
     """
     Args:
@@ -210,18 +230,20 @@ def pressures_mapu(
     with computation(BACKWARD):
         with interval(-1, None):
             pe_bottom = pe
-            pe1_bottom = pe
+            # pe1_bottom = pe
         with interval(0, -1):
             pe_bottom = pe_bottom[0, 0, 1]
-            pe1_bottom = pe1_bottom[0, 0, 1]
+            # pe1_bottom = pe1_bottom[0, 0, 1]
     with computation(FORWARD):
         with interval(0, 1):
-            pe0 = pe
+            # pe0 = pe
+            pe0 = ptop
         with interval(1, None):
-            pe0 = 0.5 * (pe[0, -1, 0] + pe1)
+            # pe0 = 0.5 * (pe[0, -1, 0] + pe1)
+            pe0 = 0.5 * (pe[0, -1, 0] + pe)
     with computation(FORWARD), interval(...):
         bkh = 0.5 * bk
-        pe3 = ak + bkh * (pe_bottom[0, -1, 0] + pe1_bottom)
+        pe3 = ak + bkh * (pe_bottom[0, -1, 0] + pe_bottom)
 
 
 def pressures_mapv(
@@ -243,8 +265,9 @@ def pressures_mapv(
             pe_bottom = pe_bottom[0, 0, 1]
     with computation(FORWARD):
         with interval(0, 1):
-            pe3 = ak
-            pe0 = pe
+            bkh = 0.5 * bk
+            pe3 = ak + bkh * (pe_bottom[-1, 0, 0] + pe_bottom)
+            # pe0 = pe
         with interval(1, None):
             bkh = 0.5 * bk
             pe0 = 0.5 * (pe[-1, 0, 0] + pe)
@@ -280,6 +303,51 @@ def copy_from_below(a: FloatField, b: FloatField):
         b = a[0, 0, -1]
 
 
+def pe_pk_delp_peln(
+    pe: FloatField,
+    pk: FloatField,
+    delp: FloatField,
+    peln: FloatField,
+    pe2: FloatField,
+    pk2: FloatField,
+    pn2: FloatField,
+    ak: FloatFieldK,
+    bk: FloatFieldK,
+    akap: Float,
+    ptop: Float,
+):
+    with computation(BACKWARD):
+        with interval(-1, None):
+            pe_bottom = pe
+        with interval(0, -1):
+            pe_bottom = pe_bottom[0, 0, 1]
+
+    with computation(PARALLEL):
+        with interval(0, 1):
+            pe2 = ptop
+            pn2 = peln
+            pk2 = pk
+        with interval(1, -1):
+            pe2 = ak + bk * pe_bottom
+            pn2 = log(pe2)
+            pk2 = exp(akap * pn2)
+        with interval(-1, None):
+            pe2 = pe
+            pn2 = peln
+            pk2 = pk
+
+    with computation(PARALLEL):
+        with interval(0, -1):
+            pe = pe2
+            pk = pk2
+            delp = pe2[0, 0, 1] - pe2[0, 0, 0]
+            peln = pn2
+        with interval(-1, None):
+            pe = pe2
+            pk = pk2
+            peln = pn2
+
+
 class LagrangianToEulerian(NDSLRuntime):
     """
     Fortran name is Lagrangian_to_Eulerian
@@ -291,8 +359,8 @@ def __init__(
         quantity_factory: QuantityFactory,
         config: RemappingConfig,
         area_64,
-        nq,
         pfull,
+        nwat: int = 0,
     ):
         super().__init__(stencil_factory)
 
@@ -304,8 +372,13 @@ def __init__(
         if hydrostatic:
             raise NotImplementedError("Hydrostatic is not implemented")
 
+        if nwat != 6:
+            raise NotImplementedError(
+                "Only 6 water species is implemented for the legacy Remapping,"
+                f" {nwat} were requested."
+            )
+
         self._t_min = 184.0
-        self._nq = nq
         # do_omega = hydrostatic and last_step # TODO pull into inputs
         self._domain_jextra = (
             grid_indexing.domain[0],
@@ -375,6 +448,13 @@ def __init__(
 
         self._do_sat_adjust = config.do_sat_adj
 
+        self._remap_t = False
+
+        # NOTE: In GEOS, remap_t is set to True in general
+        #       Add in the "remap_option" check later
+        if True:
+            self._remap_t = True
+
         self.kmp = grid_indexing.domain[2] - 1
         for k in range(pfull.shape[0]):
             if pfull.view[k] > 10.0e2:
@@ -385,9 +465,20 @@ def __init__(
             init_pe, origin=grid_indexing.origin_compute(), domain=self._domain_jextra
         )
 
+        water_species_externals = {
+            "nwat": nwat,
+            "i_vapor": FVTracers.index("vapor"),
+            "i_liquid": FVTracers.index("liquid") if self.nwat == 6 else -1,
+            "i_rain": FVTracers.index("rain") if self.nwat == 6 else -1,
+            "i_ice": FVTracers.index("ice") if self.nwat == 6 else -1,
+            "i_snow": FVTracers.index("snow") if self.nwat == 6 else -1,
+            "i_graupel": FVTracers.index("graupel") if self.nwat == 6 else -1,
+        }
+
         self._moist_cv_pt_pressure = stencil_factory.from_origin_domain(
             moist_cv_pt_pressure,
-            externals={"kord_tm": config.kord_tm, "hydrostatic": hydrostatic},
+            # externals={"kord_tm": config.kord_tm, "hydrostatic": hydrostatic},
+            externals=water_species_externals,
             origin=grid_indexing.origin_compute(),
             domain=grid_indexing.domain_compute(add=(0, 0, 1)),
         )
@@ -410,7 +501,6 @@ def __init__(
             stencil_factory,
             quantity_factory,
             abs(config.kord_tr),
-            nq,
             fill=config.fill,
         )
 
@@ -444,6 +534,7 @@ def __init__(
             moist_cv.moist_pkz,
             origin=grid_indexing.origin_compute(),
             domain=grid_indexing.domain_compute(),
+            externals=water_species_externals,
         )
 
         self._pressures_mapu = stencil_factory.from_origin_domain(
@@ -495,9 +586,10 @@ def __init__(
             domain=grid_indexing.domain_compute(),
         )
 
-        self._saturation_adjustment = SatAdjust3d(
-            stencil_factory, config.sat_adjust, area_64, self.kmp
-        )
+        if self._do_sat_adjust:
+            self._saturation_adjustment = SatAdjust3d(
+                stencil_factory, config.sat_adjust, area_64, self.kmp, nwat=nwat
+            )
 
         self._moist_cv_last_step_stencil = stencil_factory.from_origin_domain(
             moist_pt_last_step,
@@ -507,6 +599,7 @@ def __init__(
                 grid_indexing.domain[1],
                 grid_indexing.domain[2] + 1,
             ),
+            externals=water_species_externals,
         )
 
         self._basic_adjust_divide_stencil = stencil_factory.from_origin_domain(
@@ -517,7 +610,7 @@ def __init__(
 
     def __call__(
         self,
-        tracers: dace.compiletime,  # dict[str, Quantity],
+        tracers: FVTracers,
         pt: FloatField,
         delp: FloatField,
         delz: FloatField,
@@ -527,7 +620,6 @@ def __call__(
         w: FloatField,
         cappa: FloatField,
         q_con: FloatField,
-        q_cld: FloatField,
         pkz: FloatField,
         pk: FloatField,
         pe: FloatField,
@@ -561,7 +653,6 @@ def __call__(
             va (inout): A-grid y-velocity
             cappa (inout): Power to raise pressure to
             q_con (out): Total condensate mixing ratio
-            q_cld (out): Cloud fraction
             pkz (in): Layer mean pressure raised to the power of Kappa
             pk (out): Interface pressure raised to power of kappa, final acoustic value
             pe (in): Pressure at layer edges
@@ -592,12 +683,7 @@ def __call__(
         # pe2 is final Eulerian edge pressures
 
         self._moist_cv_pt_pressure(
-            tracers["qvapor"],
-            tracers["qliquid"],
-            tracers["qrain"],
-            tracers["qsnow"],
-            tracers["qice"],
-            tracers["qgraupel"],
+            tracers,
             q_con,
             pt,
             cappa,
@@ -611,6 +697,7 @@ def __call__(
             ps,
             self._pn2,
             peln,
+            self._remap_t,
             zvir,
         )
 
@@ -624,6 +711,8 @@ def __call__(
         self._map_single_w(w, self._pe1, self._pe2, qs=wsd)
         self._map_single_delz(delz, self._pe1, self._pe2)
 
+        # W_limiter routine will go here
+
         self._undo_delz_adjust_and_copy_peln(delp, delz, peln, self._pe0, self._pn2)
         # if do_omega:  # NOTE untested
         #    pe3 = copy(omga, origin=(grid_indexing.isc, grid_indexing.jsc, 1))
@@ -632,12 +721,7 @@ def __call__(
         # it clear the outputs are not needed until then?
         # or, are its outputs actually used? can we delete this stencil call?
         self._moist_cv_pkz(
-            tracers["qvapor"],
-            tracers["qliquid"],
-            tracers["qrain"],
-            tracers["qsnow"],
-            tracers["qice"],
-            tracers["qgraupel"],
+            tracers,
             q_con,
             self._gz,
             self._cvm,
@@ -681,13 +765,13 @@ def __call__(
             fast_mp_consv = consv_te > CONSV_MIN
             self._saturation_adjustment(
                 dp1,
-                tracers["qvapor"],
-                tracers["qliquid"],
-                tracers["qice"],
-                tracers["qrain"],
-                tracers["qsnow"],
-                tracers["qgraupel"],
-                q_cld,
+                tracers[:, :, :, FVTracers.index("vapor")],
+                tracers[:, :, :, FVTracers.index("liquid")],
+                tracers[:, :, :, FVTracers.index("ice")],
+                tracers[:, :, :, FVTracers.index("rain")],
+                tracers[:, :, :, FVTracers.index("snow")],
+                tracers[:, :, :, FVTracers.index("graupel")],
+                tracers[:, :, :, FVTracers.index("cloud")],
                 hs,
                 peln,
                 delp,
@@ -709,12 +793,7 @@ def __call__(
             # to the physics, but if we're staying in dynamics we need
             # to keep it as the virtual potential temperature
             self._moist_cv_last_step_stencil(
-                tracers["qvapor"],
-                tracers["qliquid"],
-                tracers["qrain"],
-                tracers["qsnow"],
-                tracers["qice"],
-                tracers["qgraupel"],
+                tracers,
                 self._gz,
                 pt,
                 pkz,
diff --git a/pyfv3/stencils/remapping_GEOS.py b/pyfv3/stencils/remapping_GEOS.py
new file mode 100644
index 00000000..04f02460
--- /dev/null
+++ b/pyfv3/stencils/remapping_GEOS.py
@@ -0,0 +1,588 @@
+from gt4py.cartesian.gtscript import FORWARD, computation, interval
+
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
+from ndsl.comm.communicator import Communicator
+from ndsl.constants import (
+    CV_AIR,
+    GRAV,
+    I_DIM,
+    I_INTERFACE_DIM,
+    J_DIM,
+    J_INTERFACE_DIM,
+    K_DIM,
+    K_INTERFACE_DIM,
+)
+from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ, FloatFieldIJ64, FloatFieldK
+from ndsl.grid import GridData
+from ndsl.stencils.basic_operations import adjust_divide_stencil
+from pyfv3._config import RemappingConfig
+from pyfv3.mpi.sum import GlobalSum
+from pyfv3.stencils import moist_cv
+from pyfv3.stencils.map_single import MapSingle
+from pyfv3.stencils.mapn_tracer import MapNTracer
+from pyfv3.stencils.moist_cv import moist_pt_last_step
+from pyfv3.stencils.remapping import (
+    CONSV_MIN,
+    init_pe,
+    moist_cv_pt_pressure,
+    pe0_ptop_xmax,
+    pe_pk_delp_peln,
+    pn2_pk_delp,
+    pressures_mapu,
+    pressures_mapv,
+)
+from pyfv3.stencils.saturation_adjustment import SatAdjust3d
+from pyfv3.stencils.scale_delz import rescale_delz_1, rescale_delz_2
+from pyfv3.stencils.w_fix_consrv_moment import W_fix_consrv_moment
+from pyfv3.tracers import FVTracers
+
+
+def _normalize_to_grid_stencil(
+    te_2d: FloatFieldIJ, zsum_2d: FloatFieldIJ, area: FloatFieldIJ64
+):
+    with computation(FORWARD), interval(0, 1):
+        te_2d = te_2d * area
+        zsum_2d = zsum_2d * area
+
+
+class LagrangianToEulerian_GEOS(NDSLRuntime):
+    """
+    GEOS v11.4.2 remapping - derived from original fvcore.
+
+    Fortran name is Lagrangian_to_Eulerian
+    """
+
+    def __init__(
+        self,
+        stencil_factory: StencilFactory,
+        quantity_factory: QuantityFactory,
+        config: RemappingConfig,
+        comm: Communicator,
+        grid_data: GridData,
+        pfull,
+        adiabatic: bool,
+        nwat: int,
+    ):
+        super().__init__(stencil_factory)
+
+        self._comm = comm
+        self._stencil_factory = stencil_factory
+        grid_indexing = stencil_factory.grid_indexing
+
+        # Configuration
+        self._hydrostatic = config.hydrostatic
+        if self._hydrostatic:
+            raise NotImplementedError("Hydrostatic is not implemented")
+
+        if adiabatic:
+            raise NotImplementedError("Adiabatic is not implemented")
+
+        self._t_min = Float(184.0)
+        self.nwat = nwat
+        self._w_max = Float(90.0)
+        self._w_min = Float(-60.0)
+        self._area_64 = grid_data.area_64
+        self._cosa_s = grid_data.cosa_s
+        self._rsin2 = grid_data.rsin2
+        self._kord_tm = abs(config.kord_tm)
+        self._kord_wz = config.kord_wz
+        self._kord_mt = config.kord_mt
+        self._do_sat_adjust = config.do_sat_adj
+        self._adiabatic = adiabatic
+        self.kmp = grid_indexing.domain[2] - 1
+        for k in range(pfull.shape[0]):
+            if pfull.view[k] > 10.0e2:
+                self.kmp = k
+                break
+        # do_omega = hydrostatic and last_step # TODO pull into inputs
+
+        if self.nwat not in [0, 6]:
+            raise NotImplementedError(
+                f"Remapping: {self.nwat} water species, only 0 and 6 implemented"
+            )
+
+        # Locals
+        self._pe1 = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM], units="Pa"
+        )
+        self._pe2 = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM], units="Pa"
+        )
+        self._pe3 = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM], units="Pa"
+        )
+        self._dp2 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM], units="Pa")
+        self._pn1 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM], units="Pa")
+        self._pn2 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM], units="Pa")
+        self._pe0 = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM], units="Pa"
+        )
+        self._pe3 = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM], units="Pa"
+        )
+
+        self._gz = self.make_local(quantity_factory, [I_DIM, J_DIM], units="m^2 s^-2")
+        self._cvm = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._compute_performed = self.make_local(
+            quantity_factory, [I_DIM, J_DIM], dtype=bool, units="mask"
+        )
+        self._w2 = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_DIM], units="temp W"
+        )
+        self._pk2 = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM], units="Pa")
+
+        self._phis = self.make_local(quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM])
+
+        # TODO: The following should be local but because of their use in a callback (GlobalSum)
+        #       they have to be persistent memory.
+        self.te_2d = quantity_factory.zeros([I_DIM, J_DIM], units="Pa")
+        self.zsum1 = quantity_factory.zeros([I_DIM, J_DIM], units="Pa")
+
+        # Stencils
+        water_species_externals = {
+            "nwat": self.nwat,
+            "i_vapor": FVTracers.index("vapor"),
+            "i_liquid": FVTracers.index("liquid") if self.nwat == 6 else -1,
+            "i_rain": FVTracers.index("rain") if self.nwat == 6 else -1,
+            "i_ice": FVTracers.index("ice") if self.nwat == 6 else -1,
+            "i_snow": FVTracers.index("snow") if self.nwat == 6 else -1,
+            "i_graupel": FVTracers.index("graupel") if self.nwat == 6 else -1,
+        }
+
+        self._global_sum = GlobalSum(
+            communicator=comm,
+            quantity_factory=quantity_factory,
+            grid_indexing=stencil_factory.grid_indexing,
+        )
+
+        self._init_pe = stencil_factory.from_origin_domain(
+            init_pe,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(0, 1, 1)),
+        )
+
+        self._moist_cv_pt_pressure = stencil_factory.from_origin_domain(
+            moist_cv_pt_pressure,
+            externals=water_species_externals,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(0, 0, 1)),
+        )
+
+        self._pn2_pk_delp = stencil_factory.from_origin_domain(
+            pn2_pk_delp,
+            origin=grid_indexing.origin_compute(add=(0, 0, 1)),
+            domain=grid_indexing.domain_compute(add=(0, 0, -1)),
+        )
+
+        self._map_single_pt = MapSingle(
+            stencil_factory,
+            quantity_factory,
+            self._kord_tm,
+            mode=1,
+            dims=[I_DIM, J_DIM, K_DIM],
+            interpolate_contribution=True,
+        )
+
+        self._mapn_tracer = MapNTracer(
+            stencil_factory,
+            quantity_factory,
+            kord=abs(config.kord_tr),
+            fill=config.fill,
+        )
+
+        self._map_single_w = MapSingle(
+            stencil_factory,
+            quantity_factory,
+            self._kord_wz,
+            mode=-2,
+            dims=[I_DIM, J_DIM, K_DIM],
+        )
+
+        self._map_single_delz = MapSingle(
+            stencil_factory,
+            quantity_factory,
+            self._kord_wz,
+            mode=1,
+            dims=[I_DIM, J_DIM, K_DIM],
+        )
+
+        self._moist_cv_pkz = stencil_factory.from_origin_domain(
+            moist_cv.moist_pkz,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(),
+            externals=water_species_externals,
+        )
+
+        self._pressures_mapu = stencil_factory.from_origin_domain(
+            pressures_mapu,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(0, 1, 1)),
+        )
+
+        self._map_single_u = MapSingle(
+            stencil_factory,
+            quantity_factory,
+            self._kord_mt,
+            mode=-1,
+            dims=[I_DIM, J_INTERFACE_DIM, K_DIM],
+        )
+
+        self._pressures_mapv = stencil_factory.from_origin_domain(
+            pressures_mapv,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(1, 0, 1)),
+        )
+
+        self._map_single_v = MapSingle(
+            stencil_factory,
+            quantity_factory,
+            self._kord_mt,
+            mode=-1,
+            dims=[I_INTERFACE_DIM, J_DIM, K_DIM],
+        )
+
+        if self._do_sat_adjust:
+            self._saturation_adjustment = SatAdjust3d(
+                stencil_factory,
+                config.sat_adjust,
+                self._area_64,
+                self.kmp,
+                nwat=self.nwat,
+            )
+
+        self._moist_cv_last_step_stencil = stencil_factory.from_origin_domain(
+            moist_pt_last_step,
+            origin=(grid_indexing.isc, grid_indexing.jsc, 0),
+            domain=(
+                grid_indexing.domain[0],
+                grid_indexing.domain[1],
+                grid_indexing.domain[2] + 1,
+            ),
+            externals=water_species_externals,
+        )
+
+        self._fill_cond = stencil_factory.from_origin_domain(
+            moist_cv.cond_output,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(),
+            externals=water_species_externals,
+        )
+
+        self._adjust_divide = stencil_factory.from_origin_domain(
+            adjust_divide_stencil,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(),
+        )
+
+        self._rescale_delz_1 = stencil_factory.from_origin_domain(
+            rescale_delz_1,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(),
+        )
+
+        self._rescale_delz_2 = stencil_factory.from_origin_domain(
+            rescale_delz_2,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(),
+        )
+
+        self._w_fix_consrv_moment = stencil_factory.from_origin_domain(
+            func=W_fix_consrv_moment,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(),
+        )
+
+        self._pe0_ptop_xmax = stencil_factory.from_origin_domain(
+            pe0_ptop_xmax,
+            origin=(
+                grid_indexing.n_halo + grid_indexing.domain[0],
+                grid_indexing.n_halo,
+                0,
+            ),
+            domain=(1, grid_indexing.domain[1] + 1, 1),
+        )
+        self._pe_pk_delp_peln = stencil_factory.from_origin_domain(
+            pe_pk_delp_peln,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(0, 0, 1)),
+        )
+        self._moist_cv_te = stencil_factory.from_origin_domain(
+            moist_cv.moist_te,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(add=(0, 0, 1)),
+            externals=water_species_externals,
+        )
+
+        self._te_zsum = stencil_factory.from_origin_domain(
+            moist_cv.te_zsum,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(),
+        )
+
+        self._normalize_to_grid = stencil_factory.from_origin_domain(
+            _normalize_to_grid_stencil,
+            origin=grid_indexing.origin_compute(),
+            domain=grid_indexing.domain_compute(),
+        )
+
+    def __call__(
+        self,
+        tracers: FVTracers,  # ty: ignore[invalid-type-form]
+        pt: FloatField,
+        delp: FloatField,
+        delz: FloatField,
+        peln: FloatField,
+        u: FloatField,
+        v: FloatField,
+        w: FloatField,
+        mfx: FloatField,
+        mfy: FloatField,
+        cx: FloatField,
+        cy: FloatField,
+        cappa: FloatField,
+        q_con: FloatField,
+        pkz: FloatField,
+        pk: FloatField,
+        pe: FloatField,
+        hs: FloatFieldIJ,
+        te0_2d: FloatFieldIJ,
+        ps: FloatFieldIJ,
+        wsd: FloatFieldIJ,
+        ak: FloatFieldK,
+        bk: FloatFieldK,
+        dp1: FloatField,
+        ptop: Float,
+        akap: Float,
+        zvir: Float,
+        last_step: bool,
+        consv_te: Float,
+        mdt: Float,
+    ):
+        """
+        Remap the deformed Lagrangian surfaces onto the reference, or "Eulerian",
+        coordinate levels.
+
+        Args:
+            tracers (inout): Tracer species tracked across
+            pt (inout): D-grid potential temperature
+            delp (inout): Pressure Thickness
+            delz (in): Vertical thickness of atmosphere layers
+            peln (inout): Logarithm of interface pressure
+            u (inout): D-grid x-velocity
+            v (inout): D-grid y-velocity
+            w (inout): Vertical velocity
+            ua (inout): A-grid x-velocity
+            va (inout): A-grid y-velocity
+            cappa (inout): Power to raise pressure to
+            q_con (out): Total condensate mixing ratio
+            pkz (in): Layer mean pressure raised to the power of Kappa
+            pk (out): Interface pressure raised to power of kappa, final acoustic value
+            pe (in): Pressure at layer edges
+            hs (in): Surface geopotential
+            te0_2d (inout): Atmosphere total energy in columns
+            ps (out): Surface pressure
+            wsd (in): Vertical velocity of the lowest level
+            omga (unused): Vertical pressure velocity
+            ak (in): Atmosphere hybrid a coordinate (Pa)
+            bk (in): Atmosphere hybrid b coordinate (dimensionless)
+            pfull (in): Pressure full levels
+            dp1 (out): Pressure thickness before dyn_core (only written
+                if do_sat_adjust=True)
+            ptop (in): The pressure level at the top of atmosphere
+            akap (in): Poisson constant (KAPPA)
+            zvir (in): Constant (Rv/Rd-1)
+            last_step (in): Flag for the last step of k-split remapping
+            consv_te (in): If True, conserve total energy
+            mdt (in) : Remap time step
+            bdt (in): Timestep
+        """
+        # Global structure:
+        #   pe1 is initial lagrangian edge pressures
+        #   pe2 is final Eulerian edge pressures
+
+        # Build remapping profiles
+        self._init_pe(pe, self._pe1, self._pe2, ptop)
+        self._moist_cv_pt_pressure(
+            tracers,
+            q_con=q_con,
+            pt=pt,
+            cappa=cappa,
+            delp=delp,
+            delz=delz,
+            pe=pe,
+            pe2=self._pe2,
+            ak=ak,
+            bk=bk,
+            dp2=self._dp2,
+            ps=ps,
+            pn1=self._pn1,
+            pn2=self._pn2,
+            peln=peln,
+            remap_t=True,
+            r_vir=zvir,
+        )
+        self._pn2_pk_delp(
+            pe2=self._pe2,
+            pn2=self._pn2,
+            pk=self._pk2,
+            akap=akap,
+        )
+
+        # Now that we have the pressure profiles, we can start remapping
+
+        # Map pressure
+        self._map_single_pt(
+            pt,
+            self._pn1,
+            self._pn2,
+            qmin=self._t_min,
+        )
+
+        # Map all tracers
+        self._mapn_tracer(self._pe1, self._pe2, self._dp2, tracers)
+
+        # Map vertical wind
+        self._map_single_w(w, self._pe1, self._pe2, qs=wsd)
+        self._rescale_delz_1(delz, delp)
+        self._map_single_delz(delz, self._pe1, self._pe2)
+        self._rescale_delz_2(delz, self._dp2)
+        self._w_fix_consrv_moment(
+            w=w,
+            w2=self._w2,
+            dp2=self._dp2,
+            gz=self._gz,
+            w_max=self._w_max,
+            w_min=self._w_min,
+            compute_performed=self._compute_performed,
+        )
+
+        # Map horizontal winds, fluxes and courant number
+        self._pressures_mapu(pe, ak, bk, self._pe0, self._pe3, ptop)
+        self._pe0_ptop_xmax(self._pe0, ptop)
+        self._map_single_u(u, self._pe0, self._pe3)
+        self._map_single_u(mfy, self._pe0, self._pe3)
+        self._map_single_u(cy, self._pe0, self._pe3)
+
+        self._pressures_mapv(pe, ak, bk, self._pe0, self._pe3)
+        self._map_single_v(v, self._pe0, self._pe3)
+        self._map_single_v(mfx, self._pe0, self._pe3)
+        self._map_single_v(cx, self._pe0, self._pe3)
+
+        self._pe_pk_delp_peln(
+            pe=pe,
+            pk=pk,
+            delp=delp,
+            peln=peln,
+            pe2=self._pe2,
+            pk2=self._pk2,
+            pn2=self._pn2,
+            ak=ak,
+            bk=bk,
+            akap=akap,
+            ptop=ptop,
+        )
+
+        self._moist_cv_pkz(
+            tracers=tracers,
+            pkz=pkz,
+            pt=pt,
+            cappa=cappa,
+            delp=delp,
+            delz=delz,
+            r_vir=zvir,
+        )
+
+        dtmp = 0.0
+        if last_step:
+            if consv_te > CONSV_MIN:
+                self._moist_cv_te(
+                    tracers=tracers,
+                    u=u,
+                    v=v,
+                    w=w,
+                    te=self.te_2d,
+                    pt=pt,
+                    phis=self._phis,
+                    delp=delp,
+                    rsin2=self._rsin2,
+                    cosa_s=self._cosa_s,
+                    hs=hs,
+                    delz=delz,
+                    grav=GRAV,
+                )
+
+                self._te_zsum(
+                    te_2d=self.te_2d,
+                    te0_2d=te0_2d,
+                    delp=delp,
+                    pkz=pkz,
+                    zsum1=self.zsum1,
+                )
+
+                # We can normalize to the same array because
+                # they are properly reset in the above stencils
+                self._normalize_to_grid(self.te_2d, self.zsum1, self._area_64)
+
+                tesum: Float = self._global_sum(self.te_2d)
+                zsum: Float = self._global_sum(self.zsum1)
+                dtmp = tesum / (CV_AIR * zsum)
+
+            elif consv_te < -CONSV_MIN:
+                raise NotImplementedError(
+                    "Unimplemented/untested case consv("
+                    + str(consv_te)
+                    + ")  < -CONSV_MIN("
+                    + str(-CONSV_MIN)
+                    + ")"
+                )
+
+        if self._do_sat_adjust:
+            fast_mp_consv = consv_te > CONSV_MIN
+            self._saturation_adjustment(
+                dp1,
+                tracers[:, :, :, FVTracers.index("vapor")],
+                tracers[:, :, :, FVTracers.index("liquid")],
+                tracers[:, :, :, FVTracers.index("ice")],
+                tracers[:, :, :, FVTracers.index("rain")],
+                tracers[:, :, :, FVTracers.index("snow")],
+                tracers[:, :, :, FVTracers.index("graupel")],
+                tracers[:, :, :, FVTracers.index("cloud")],
+                hs,
+                peln,
+                delp,
+                delz,
+                q_con,
+                pt,
+                pkz,
+                cappa,
+                zvir,
+                mdt,
+                fast_mp_consv,
+                last_step,
+                akap,
+                self.kmp,
+            )
+
+        if last_step and not self._adiabatic:
+            if not self._hydrostatic:
+                # on the last step, we need the regular temperature to send
+                # to the physics, but if we're staying in dynamics we need
+                # to keep it as the virtual potential temperature
+                self._moist_cv_last_step_stencil(
+                    tracers=tracers,
+                    pt=pt,
+                    pkz=pkz,
+                    dtmp=dtmp,
+                    r_vir=zvir,
+                )
+                self._fill_cond(
+                    q_con=q_con,
+                    tracers=tracers,
+                )
+            else:
+                raise NotImplementedError(
+                    "Remapping: last step output temperatur for non hydrostatic case"
+                )
+        else:
+            # converts virtual temperature back to virtual potential temperature
+            self._adjust_divide(pkz, pt)
diff --git a/pyfv3/stencils/riem_solver3.py b/pyfv3/stencils/riem_solver3.py
index 8a901bc9..b2f87cee 100644
--- a/pyfv3/stencils/riem_solver3.py
+++ b/pyfv3/stencils/riem_solver3.py
@@ -1,8 +1,9 @@
-import math
 import typing
 
+import numpy as np
+
 import ndsl.constants as constants
-from ndsl import QuantityFactory, StencilFactory, orchestrate
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_DIM, K_INTERFACE_DIM
 from ndsl.dsl.gt4py import BACKWARD, FORWARD, PARALLEL, computation, exp, interval, log
 from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
@@ -136,7 +137,7 @@ def finalize(
             zh = zh[0, 0, 1] - dz
 
 
-class NonhydrostaticVerticalSolver:
+class NonhydrostaticVerticalSolver(NDSLRuntime):
     """
     Fortran subroutine Riem_Solver3
 
@@ -152,52 +153,45 @@ def __init__(
         quantity_factory: QuantityFactory,
         config: RiemannConfig,
     ):
+        super().__init__(stencil_factory)
+
         grid_indexing = stencil_factory.grid_indexing
         self._sim1_solve = Sim1Solver(
             stencil_factory,
-            config.p_fac,
+            Float(config.p_fac),
             n_halo=0,
         )
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-        )
-
         if config.a_imp <= 0.999:
             raise NotImplementedError("a_imp <= 0.999 is not implemented")
 
-        self._delta_mass = quantity_factory.zeros(
+        self._delta_mass = self.make_local(
+            quantity_factory,
             [I_DIM, J_DIM, K_DIM],
             units="kg",
-            dtype=Float,
         )
-        self._tmp_pe_init = quantity_factory.zeros(
+        self._tmp_pe_init = self.make_local(
+            quantity_factory,
             [I_DIM, J_DIM, K_INTERFACE_DIM],
             units="Pa",
-            dtype=Float,
         )
-        self._p_gas = quantity_factory.zeros(
+        self._p_gas = self.make_local(
+            quantity_factory,
             [I_DIM, J_DIM, K_DIM],
             units="Pa",
-            dtype=Float,
         )
-        self._p_interface = quantity_factory.zeros(
+        self._p_interface = self.make_local(
+            quantity_factory,
             [I_DIM, J_DIM, K_INTERFACE_DIM],
             units="Pa",
-            dtype=Float,
         )
-        self._log_p_interface = quantity_factory.zeros(
+        self._log_p_interface = self.make_local(
+            quantity_factory,
             [I_DIM, J_DIM, K_INTERFACE_DIM],
             units="log(Pa)",
-            dtype=Float,
         )
 
         # gamma parameter is (cp/cv)
-        self._gamma = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_DIM],
-            units="",
-            dtype=Float,
-        )
+        self._gamma = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
 
         riemorigin = grid_indexing.origin_compute()
         domain = grid_indexing.domain_compute(add=(0, 0, 1))
@@ -208,7 +202,7 @@ def __init__(
         )
         self._finalize_stencil = stencil_factory.from_origin_domain(
             finalize,
-            externals={"use_logp": config.use_logp, "beta": config.beta},
+            externals={"use_logp": config.use_logp, "beta": Float(config.beta)},
             origin=riemorigin,
             domain=domain,
         )
@@ -277,9 +271,9 @@ def __call__(
         # gm2 is gamma (cp/cv)
         # dz2 is delz
 
-        peln1 = math.log(ptop)
+        peln1 = np.log(ptop, dtype=Float)
         # ptk = ptop ** kappa
-        ptk = math.exp(constants.KAPPA * peln1)
+        ptk = np.exp(constants.KAPPA * peln1, dtype=Float)
 
         self._precompute_stencil(
             delp,
diff --git a/pyfv3/stencils/riem_solver_c.py b/pyfv3/stencils/riem_solver_c.py
index 6a9be3d6..bd931e23 100644
--- a/pyfv3/stencils/riem_solver_c.py
+++ b/pyfv3/stencils/riem_solver_c.py
@@ -1,7 +1,7 @@
 import typing
 
 import ndsl.constants as constants
-from ndsl import QuantityFactory, StencilFactory
+from ndsl import NDSLRuntime, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_DIM, K_INTERFACE_DIM
 from ndsl.dsl.gt4py import BACKWARD, FORWARD, PARALLEL, computation, interval, log
 from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
@@ -57,7 +57,7 @@ def precompute(
         dz = gz[0, 0, 1] - gz
     with computation(PARALLEL), interval(...):
         gm = 1.0 / (1.0 - cappa)
-        dm /= constants.GRAV
+        dm *= constants.RGRAV
     with computation(PARALLEL), interval(0, -1):
         # (1) From \partial p*/\partial z = -\rho g, we can separate and integrate
         # over a layer to get
@@ -114,7 +114,7 @@ def finalize(
             gz = gz[0, 0, 1] - dz * constants.GRAV
 
 
-class NonhydrostaticVerticalSolverCGrid:
+class NonhydrostaticVerticalSolverCGrid(NDSLRuntime):
     """
     Fortran subroutine Riem_Solver_C
 
@@ -132,45 +132,23 @@ def __init__(
         quantity_factory: QuantityFactory,
         p_fac: Float,
     ):
+        super().__init__(stencil_factory)
+
         grid_indexing = stencil_factory.grid_indexing
         origin = grid_indexing.origin_compute(add=(-1, -1, 0))
         domain = grid_indexing.domain_compute(add=(2, 2, 1))
 
-        self._dm = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_DIM],
-            units="kg",
-            dtype=Float,
-        )
-        self._w = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_DIM],
-            units="m/s",
-            dtype=Float,
-        )
-        self._pem = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_INTERFACE_DIM],
-            units="Pa",
-            dtype=Float,
-        )
-        self._pe = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_INTERFACE_DIM],
-            units="Pa",
-            dtype=Float,
-        )
-        self._gm = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_DIM],
-            units="",
-            dtype=Float,
-        )
-        self._dz = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_DIM],
-            units="m",
-            dtype=Float,
+        self._dm = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM], units="kg")
+        self._w = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM], units="m/s")
+        self._pem = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM], units="Pa"
         )
-        self._pm = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_DIM],
-            units="Pa",
-            dtype=Float,
+        self._pe = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM], units="Pa"
         )
+        self._gm = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM], units="")
+        self._dz = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM], units="m")
+        self._pm = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM], units="Pa")
 
         self._precompute_stencil = stencil_factory.from_origin_domain(
             precompute,
@@ -179,7 +157,7 @@ def __init__(
         )
         self._sim1_solve = Sim1Solver(
             stencil_factory,
-            p_fac,
+            Float(p_fac),
             n_halo=1,
         )
         self._finalize_stencil = stencil_factory.from_origin_domain(
diff --git a/pyfv3/stencils/saturation_adjustment.py b/pyfv3/stencils/saturation_adjustment.py
index b4e4009f..cd234d2b 100644
--- a/pyfv3/stencils/saturation_adjustment.py
+++ b/pyfv3/stencils/saturation_adjustment.py
@@ -1,7 +1,7 @@
 import math
 
 import ndsl.constants as constants
-from ndsl import StencilFactory
+from ndsl import NDSLRuntime, StencilFactory
 from ndsl.dsl.gt4py import PARALLEL, computation, exp, floor
 from ndsl.dsl.gt4py import function as gtfunction
 from ndsl.dsl.gt4py import interval, log
@@ -933,10 +933,22 @@ def satadjust(
             pkz = compute_pkz_func(dp, delz, pt, cappa)
 
 
-class SatAdjust3d:
+class SatAdjust3d(NDSLRuntime):
     def __init__(
-        self, stencil_factory: StencilFactory, config: SatAdjustConfig, area_64, kmp
+        self,
+        stencil_factory: StencilFactory,
+        config: SatAdjustConfig,
+        area_64,
+        kmp,
+        nwat: int,
     ):
+        super().__init__(stencil_factory)
+
+        if nwat != 6:
+            raise NotImplementedError(
+                "Saturation adjustement is only implemented for 6 water species"
+            )
+
         grid_indexing = stencil_factory.grid_indexing
         self._config = config
         self._area_64 = area_64
diff --git a/pyfv3/stencils/scale_delz.py b/pyfv3/stencils/scale_delz.py
new file mode 100644
index 00000000..24aa6b0e
--- /dev/null
+++ b/pyfv3/stencils/scale_delz.py
@@ -0,0 +1,19 @@
+from gt4py.cartesian.gtscript import PARALLEL, computation, interval
+
+from ndsl.dsl.typing import FloatField
+
+
+def rescale_delz_1(
+    delz: FloatField,
+    delp: FloatField,
+):
+    with computation(PARALLEL), interval(...):
+        delz = -delz / delp
+
+
+def rescale_delz_2(
+    delz: FloatField,
+    dp: FloatField,
+):
+    with computation(PARALLEL), interval(...):
+        delz = -delz * dp
diff --git a/pyfv3/stencils/sim1_solver.py b/pyfv3/stencils/sim1_solver.py
index 0cdfaf13..4338b442 100644
--- a/pyfv3/stencils/sim1_solver.py
+++ b/pyfv3/stencils/sim1_solver.py
@@ -1,7 +1,7 @@
 import typing
 
 import ndsl.constants as constants
-from ndsl import StencilFactory
+from ndsl import NDSLRuntime, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_INTERFACE_DIM
 from ndsl.dsl.gt4py import BACKWARD, FORWARD, PARALLEL, computation, exp, interval, log
 from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
@@ -132,7 +132,7 @@ def sim1_solver(
     # }
 
 
-class Sim1Solver:
+class Sim1Solver(NDSLRuntime):
     """
     Fortran name is sim1_solver
 
@@ -146,6 +146,8 @@ def __init__(
         p_fac: Float,
         n_halo: int,
     ):
+        super().__init__(stencil_factory)
+
         self._pfac = p_fac
         self._compute_sim1_solve = stencil_factory.from_dims_halo(
             func=sim1_solver,
@@ -190,8 +192,8 @@ def __call__(
 
         # TODO: email Lucas about any remaining variable naming here
 
-        t1g = 2.0 * dt * dt
-        rdt = 1.0 / dt
+        t1g = Float(2.0) * dt * dt
+        rdt = Float(1.0) / dt
         self._compute_sim1_solve(
             w,
             delta_mass,
diff --git a/pyfv3/stencils/tracer_2d_1l.py b/pyfv3/stencils/tracer_2d_1l.py
index 8e8c64fd..753fa3d8 100644
--- a/pyfv3/stencils/tracer_2d_1l.py
+++ b/pyfv3/stencils/tracer_2d_1l.py
@@ -1,12 +1,13 @@
-import math
+from typing import no_type_check
 
 from ndsl import (
+    NDSLRuntime,
     Quantity,
     QuantityFactory,
     StencilFactory,
     WrappedHaloUpdater,
-    orchestrate,
 )
+from ndsl.comm.mpi import ReductionOperator
 from ndsl.constants import (
     I_DIM,
     I_INTERFACE_DIM,
@@ -15,12 +16,15 @@
     K_DIM,
     N_HALO_DEFAULT,
 )
+from ndsl.dsl.dace.orchestration import dace_inhibitor
 from ndsl.dsl.gt4py import PARALLEL, computation
 from ndsl.dsl.gt4py import function as gtfunction
-from ndsl.dsl.gt4py import horizontal, interval, region
-from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ
+from ndsl.dsl.gt4py import horizontal, int32, interval, region
+from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ, FloatFieldK
+from ndsl.grid import GridData
 from ndsl.typing import Communicator
 from pyfv3.stencils.fvtp2d import FiniteVolumeTransport
+from pyfv3.tracers import FVTracers, FVTracersAxisName
 
 
 @gtfunction
@@ -45,6 +49,7 @@ def flux_y(cy, dya, dx, sin_sg4, sin_sg2, yfx):
     return yfx
 
 
+@no_type_check
 def flux_compute(
     cx: FloatField,
     cy: FloatField,
@@ -79,6 +84,7 @@ def flux_compute(
         yfx = flux_y(cy, dya, dx, sin_sg4, sin_sg2, yfx)
 
 
+@no_type_check
 def divide_fluxes_by_n_substeps(
     cxd: FloatField,
     xfx: FloatField,
@@ -86,10 +92,11 @@ def divide_fluxes_by_n_substeps(
     cyd: FloatField,
     yfx: FloatField,
     mfyd: FloatField,
-    n_split: int,
+    cmax: FloatFieldK,
 ):
     """
-    Divide all inputs in-place by the number of substeps n_split.
+    Divide all inputs in-place by the number of substeps n_split computed
+    from the max courant number on the grid
 
     Args:
         cxd (inout):
@@ -100,27 +107,18 @@ def divide_fluxes_by_n_substeps(
         mfyd (inout):
     """
     with computation(PARALLEL), interval(...):
-        frac = 1.0 / n_split
-        cxd = cxd * frac
-        xfx = xfx * frac
-        mfxd = mfxd * frac
-        cyd = cyd * frac
-        yfx = yfx * frac
-        mfyd = mfyd * frac
-
-
-def cmax_stencil1(cx: FloatField, cy: FloatField, cmax: FloatField):
-    with computation(PARALLEL), interval(...):
-        cmax = max(abs(cx), abs(cy))
-
-
-def cmax_stencil2(
-    cx: FloatField, cy: FloatField, sin_sg5: FloatField, cmax: FloatField
-):
-    with computation(PARALLEL), interval(...):
-        cmax = max(abs(cx), abs(cy)) + 1.0 - sin_sg5
-
-
+        n_split = int32(1.0 + cmax)
+        if n_split > 1:
+            frac = 1.0 / n_split
+            cxd = cxd * frac
+            xfx = xfx * frac
+            mfxd = mfxd * frac
+            cyd = cyd * frac
+            yfx = yfx * frac
+            mfyd = mfyd * frac
+
+
+@no_type_check
 def apply_mass_flux(
     dp1: FloatField,
     x_mass_flux: FloatField,
@@ -139,11 +137,15 @@ def apply_mass_flux(
     with computation(PARALLEL), interval(...):
         dp2 = (
             dp1
-            + (x_mass_flux - x_mass_flux[1, 0, 0] + y_mass_flux - y_mass_flux[0, 1, 0])
+            + (
+                (x_mass_flux - x_mass_flux[1, 0, 0])
+                + (y_mass_flux - y_mass_flux[0, 1, 0])
+            )
             * rarea
         )
 
 
+@no_type_check
 def apply_tracer_flux(
     q: FloatField,
     dp1: FloatField,
@@ -151,6 +153,8 @@ def apply_tracer_flux(
     fy: FloatField,
     rarea: FloatFieldIJ,
     dp2: FloatField,
+    cmax: FloatFieldK,
+    current_nsplit: int,
 ):
     """
     Args:
@@ -162,7 +166,8 @@ def apply_tracer_flux(
         dp2 (in):
     """
     with computation(PARALLEL), interval(...):
-        q = (q * dp1 + (fx - fx[1, 0, 0] + fy - fy[0, 1, 0]) * rarea) / dp2
+        if current_nsplit < int32(1.0 + cmax):
+            q = (q * dp1 + ((fx - fx[1, 0, 0]) + (fy - fy[0, 1, 0])) * rarea) / dp2
 
 
 # Simple stencil replacing:
@@ -170,6 +175,7 @@ def apply_tracer_flux(
 #   dp1[:] = dp2
 #   dp2[:] = self._tmp_dp2
 # Because dpX can be a quantity or an array
+@no_type_check
 def swap_dp(dp1: FloatField, dp2: FloatField):
     with computation(PARALLEL), interval(...):
         tmp = dp1
@@ -177,11 +183,21 @@ def swap_dp(dp1: FloatField, dp2: FloatField):
         dp2 = tmp
 
 
-class TracerAdvection:
+class TracerAdvection(NDSLRuntime):
     """
     Performs horizontal advection on tracers.
 
     Corresponds to tracer_2D_1L in the Fortran code.
+
+    Args:
+        stencil_factory: Stencil maker built on the required grid
+        quantity_factory: Quantity maker built on the required grid
+        transport: The Finite Volume to be applied to each tracers
+        grid_data: Metric Terms for the grid
+        comm: Communicator on the grid
+        tracers: Bundle of data of tracers to be advected
+        exclude_tracers: Tracers to not be advected
+        update_mass_courant: update the mass and courant numbers
     """
 
     def __init__(
@@ -189,49 +205,69 @@ def __init__(
         stencil_factory: StencilFactory,
         quantity_factory: QuantityFactory,
         transport: FiniteVolumeTransport,
-        grid_data,
+        grid_data: GridData,
         comm: Communicator,
-        tracers: dict[str, Quantity],
+        tracers: FVTracers,
+        number_of_tracer_to_advect: int | None = None,
+        update_mass_courant: bool = True,
     ):
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-            dace_compiletime_args=["tracers"],
-        )
+        super().__init__(stencil_factory)
         grid_indexing = stencil_factory.grid_indexing
         self.grid_indexing = grid_indexing  # needed for selective validation
-        self._tracer_count = len(tracers)
         self.grid_data = grid_data
+        self._update_mass_courant = update_mass_courant
+
+        if not self._update_mass_courant:
+            self._tmp_mfx = self.make_local(
+                quantity_factory, [I_INTERFACE_DIM, J_DIM, K_DIM]
+            )
+            self._tmp_mfy = self.make_local(
+                quantity_factory, [I_DIM, J_INTERFACE_DIM, K_DIM]
+            )
+            self._tmp_cx = self.make_local(
+                quantity_factory, [I_INTERFACE_DIM, J_DIM, K_DIM]
+            )
+            self._tmp_cy = self.make_local(
+                quantity_factory, [I_DIM, J_INTERFACE_DIM, K_DIM]
+            )
+        self._H = stencil_factory.grid_indexing.n_halo
+        self._number_of_tracer_to_advect = number_of_tracer_to_advect or FVTracers.size(
+            0
+        )
+        self._number_of_tracers = FVTracers.size(0)
 
-        self._x_area_flux = quantity_factory.zeros(
+        self._x_area_flux = self.make_local(
+            quantity_factory,
             [I_INTERFACE_DIM, J_DIM, K_DIM],
             units="unknown",
-            dtype=Float,
         )
-        self._y_area_flux = quantity_factory.zeros(
+        self._y_area_flux = self.make_local(
+            quantity_factory,
             [I_DIM, J_INTERFACE_DIM, K_DIM],
             units="unknown",
-            dtype=Float,
         )
-        self._x_flux = quantity_factory.zeros(
+        self._x_flux = self.make_local(
+            quantity_factory,
             [I_INTERFACE_DIM, J_INTERFACE_DIM, K_DIM],
             units="unknown",
-            dtype=Float,
         )
-        self._y_flux = quantity_factory.zeros(
+        self._y_flux = self.make_local(
+            quantity_factory,
             [I_INTERFACE_DIM, J_INTERFACE_DIM, K_DIM],
             units="unknown",
-            dtype=Float,
         )
-        self._tmp_dp = quantity_factory.zeros(
+        self._tmp_dp = self.make_local(
+            quantity_factory,
             [I_DIM, J_DIM, K_DIM],
             units="Pa",
-            dtype=Float,
         )
-        self._tmp_dp2 = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_DIM],
-            units="Pa",
-            dtype=Float,
+        # The `TracerCMax` system expects a Quantity to be
+        # able to do `.field.max` on it. Giving it a Local
+        # would lead to orchestration passing a numpy.array
+        # ⚠️ This must be a Quantity for now ⚠️
+        self._cmax = quantity_factory.zeros(
+            [K_DIM],
+            units="unitless",
         )
 
         ax_offsets = grid_indexing.axis_offsets(
@@ -275,21 +311,46 @@ def __init__(
         )
         self.finite_volume_transport: FiniteVolumeTransport = transport
 
+        # Setup tracer courant max reduction calculation
+        self._compute_cmax = TracerCMax(
+            stencil_factory=stencil_factory,
+            quantity_factory=quantity_factory,
+            grid_data=grid_data,
+            comm=comm,
+        )
+
         # Setup halo updater for tracers
         tracer_halo_spec = quantity_factory.get_quantity_halo_spec(
-            dims=[I_DIM, J_DIM, K_DIM],
+            dims=[I_DIM, J_DIM, K_DIM, FVTracersAxisName],
             n_halo=N_HALO_DEFAULT,
             dtype=Float,
         )
         self._tracers_halo_updater = WrappedHaloUpdater(
-            comm.get_scalar_halo_updater([tracer_halo_spec] * self._tracer_count),
-            tracers,
-            [t for t in tracers.keys()],
+            comm.get_scalar_halo_updater([tracer_halo_spec]),
+            {"tracers": tracers},
+            ["tracers"],
         )
 
+    def _halo_exchange_tracers(self, tracers: FVTracers):
+        self._tracers_halo_updater.update()
+
+        # We exchange all tracers - but some might not be advected.
+        # Therefore we should reset their value.
+        # Dev NOTE: a better version would restrict the halo exchange. It's
+        #           possible but we need a partial buffer spec generation
+
+        # Temporary deactivate code as we look for a better solution
+        # if self._number_of_tracer_to_advect < self._number_of_tracers:
+        #     tracers.data[
+        #         self._H : -self._H,
+        #         self._H : -self._H,
+        #         :,
+        #         self._number_of_tracer_to_advect : self._number_of_tracers,
+        #     ] = Float(0)
+
     def __call__(
         self,
-        tracers: dict[str, Quantity],
+        tracers: FVTracers,
         dp1,
         x_mass_flux,
         y_mass_flux,
@@ -311,17 +372,25 @@ def __call__(
             x_courant (inout): accumulated courant number in x-direction
             y_courant (inout): accumulated courant number in y-direction
         """
-        # DaCe parsing issue
-        # if len(tracers) != self._tracer_count:
-        #     raise ValueError(
-        #         f"incorrect number of tracers, {self._tracer_count} was "
-        #         f"specified on init but {len(tracers)} were passed"
-        #     )
-        # start HALO update on q (in dyn_core in fortran -- just has started when
-        # this function is called...)
+
+        if self._update_mass_courant:
+            working_x_mass_flux = x_mass_flux
+            working_y_mass_flux = y_mass_flux
+            working_x_courant = x_courant
+            working_y_courant = y_courant
+        else:
+            self._tmp_mfx.data = x_mass_flux
+            self._tmp_mfy.data = y_mass_flux
+            self._tmp_cx.data = x_courant
+            self._tmp_cy.data = y_courant
+            working_x_mass_flux = self._tmp_mfx
+            working_y_mass_flux = self._tmp_mfy
+            working_x_courant = self._tmp_cx
+            working_y_courant = self._tmp_cy
+
         self._flux_compute(
-            x_courant,
-            y_courant,
+            working_x_courant,
+            working_y_courant,
             self.grid_data.dxa,
             self.grid_data.dya,
             self.grid_data.dx,
@@ -330,70 +399,54 @@ def __call__(
             self.grid_data.sin_sg2,
             self.grid_data.sin_sg3,
             self.grid_data.sin_sg4,
-            # TODO: rename xfx/yfx to "area flux"
             self._x_area_flux,
             self._y_area_flux,
         )
 
-        # # TODO for if we end up using the Allreduce and compute cmax globally
-        # (or locally). For now, hardcoded.
-        # split = int(grid_indexing.domain[2] / 6)
-        # self._cmax_1(
-        #     cxd, cyd, self._tmp_cmax, origin=grid_indexing.origin_compute(),
-        #     domain=(grid_indexing.domain[0], self.grid_indexing.domain[1], split)
-        # )
-        # self._cmax_2(
-        #     cxd,
-        #     cyd,
-        #     self.grid.sin_sg5,
-        #     self._tmp_cmax,
-        #     origin=(grid_indexing.isc, self.grid_indexing.jsc, split),
-        #     domain=(
-        #         grid_indexing.domain[0],
-        #         self.grid_indexing.domain[1],
-        #         grid_indexing.domain[2] - split + 1
-        #     ),
-        # )
-        # cmax_flat = np.amax(self._tmp_cmax, axis=(0, 1))
-        # # cmax_flat is a gt4py storage still, but of dimension [npz+1]...
-
-        # cmax_max_all_ranks = cmax_flat.data
-        # # TODO mpi allreduce...
-        # # comm.Allreduce(cmax_flat, cmax_max_all_ranks, op=MPI.MAX)
-
-        cmax_max_all_ranks = 2.0
-        n_split = math.floor(1.0 + cmax_max_all_ranks)
-        # NOTE: cmax is not usually a single value, it varies with k, if return to
-        # that, make n_split a column as well
-
-        if n_split > 1.0:
-            self._divide_fluxes_by_n_substeps(
-                x_courant,
-                self._x_area_flux,
-                x_mass_flux,
-                y_courant,
-                self._y_area_flux,
-                y_mass_flux,
-                n_split,
-            )
+        self._compute_cmax(
+            cx=working_x_courant,
+            cy=working_y_courant,
+            cmax=self._cmax,
+        )
+
+        self._divide_fluxes_by_n_substeps(
+            cxd=working_x_courant,
+            xfx=self._x_area_flux,
+            mfxd=working_x_mass_flux,
+            cyd=working_y_courant,
+            yfx=self._y_area_flux,
+            mfyd=working_y_mass_flux,
+            cmax=self._cmax,
+        )
 
         self._tracers_halo_updater.update()
 
         dp2 = self._tmp_dp
 
-        for it in range(n_split):
-            last_call = it == n_split - 1
+        # The original algorithm works on K level independantly
+        # (from with a  K loop) and therefore compute `nsplit`
+        # per K
+        # The stencil nature of the framework doesn't allow for it
+        # because after advection, an halo exchange need to be carried
+        # (or else we could just move the test within the stencil).
+        # We overcompute to retain true parallelization, by running
+        # a loop on the highest number of nsplit, but restraining
+        # actual update in `apply_tracer_flux` to only the valid
+        # K level for each tracers
+        max_n_split = int(1.0 + self._compute_cmax.max_over_column)
+        for current_nsplit in range(max_n_split):
+            last_call = current_nsplit == max_n_split - 1
             # tracer substep
             self._apply_mass_flux(
                 dp1,
-                x_mass_flux,
-                y_mass_flux,
+                working_x_mass_flux,
+                working_y_mass_flux,
                 self.grid_data.rarea,
                 dp2,
             )
-            for q in tracers.values():
+            for i_tracer in range(self._number_of_tracer_to_advect):
                 self.finite_volume_transport(
-                    q,
+                    tracers[:, :, :, i_tracer],
                     x_courant,
                     y_courant,
                     self._x_area_flux,
@@ -404,15 +457,114 @@ def __call__(
                     y_mass_flux=y_mass_flux,
                 )
                 self._apply_tracer_flux(
-                    q,
+                    tracers[:, :, :, i_tracer],
                     dp1,
                     self._x_flux,
                     self._y_flux,
                     self.grid_data.rarea,
                     dp2,
+                    cmax=self._cmax,
+                    current_nsplit=current_nsplit,
                 )
             if not last_call:
-                self._tracers_halo_updater.update()
+                self._halo_exchange_tracers(tracers)
                 # we can't use variable assignment to avoid a data copy
                 # because of current dace limitations
                 self._swap_dp(dp1, dp2)
+
+
+@no_type_check
+def cmax_stencil_low_k(
+    cx: FloatField,
+    cy: FloatField,
+    cmax: FloatField,
+):
+    with computation(PARALLEL), interval(...):
+        cmax = max(abs(cx), abs(cy))
+
+
+@no_type_check
+def cmax_stencil_high_k(
+    cx: FloatField,
+    cy: FloatField,
+    sin_sg5: FloatFieldIJ,
+    cmax: FloatField,
+):
+    with computation(PARALLEL), interval(...):
+        cmax = max(abs(cx), abs(cy)) + 1.0 - sin_sg5
+
+
+class TracerCMax(NDSLRuntime):
+    def __init__(
+        self,
+        stencil_factory: StencilFactory,
+        quantity_factory: QuantityFactory,
+        grid_data: GridData,
+        comm: Communicator,
+    ):
+        """Perform global courant number max.
+
+        The maximum courant number for every atmospheric level on the entire grid.
+        """
+        super().__init__(stencil_factory)
+
+        self._grid_data = grid_data
+        self._comm = comm
+        grid_indexing = stencil_factory.grid_indexing
+        cmax_atmospheric_level_split = int(grid_indexing.domain[2] / 6) - 1
+        self._cmax_low_k = stencil_factory.from_origin_domain(
+            func=cmax_stencil_low_k,
+            origin=grid_indexing.origin_compute(),
+            domain=(
+                grid_indexing.domain[0],
+                grid_indexing.domain[1],
+                cmax_atmospheric_level_split,
+            ),
+        )
+        self._cmax_high_k = stencil_factory.from_origin_domain(
+            func=cmax_stencil_high_k,
+            origin=(
+                grid_indexing.origin_compute()[0],
+                grid_indexing.origin_compute()[1],
+                cmax_atmospheric_level_split,
+            ),
+            domain=(
+                grid_indexing.domain[0],
+                grid_indexing.domain[1],
+                grid_indexing.domain[2] - cmax_atmospheric_level_split,
+            ),
+        )
+        # When turned into a Local - orchestration decides that
+        # cmax_low and high are no longer used and skip all code
+        # -> https://github.com/NOAA-GFDL/NDSL/issues/444
+        # ⚠️ This must be a Quantity for now ⚠️
+        self._tmp_cmax = quantity_factory.zeros(
+            [I_DIM, J_DIM, K_DIM],
+            units="unknown",
+        )
+        self.max_over_column = 0
+
+    @dace_inhibitor
+    def _reduce(self, cmax):
+        if __debug__:
+            if not isinstance(cmax, Quantity):
+                raise TypeError(
+                    f"[pyfv3][Tracer]: cmax must be a quantity, got {type(cmax)}"
+                )
+        cmax[:] = self._tmp_cmax[:].max(axis=0).max(axis=0)[:]
+        self._comm.all_reduce_per_element_in_place(cmax, ReductionOperator.MAX)
+        self.max_over_column = cmax.field.max()
+
+    def __call__(self, cx, cy, cmax):
+        self._cmax_low_k(
+            cx=cx,
+            cy=cy,
+            cmax=self._tmp_cmax,
+        )
+        self._cmax_high_k(
+            cx=cx,
+            cy=cy,
+            sin_sg5=self._grid_data.sin_sg5,
+            cmax=self._tmp_cmax,
+        )
+        self._reduce(cmax)
diff --git a/pyfv3/stencils/updatedzc.py b/pyfv3/stencils/updatedzc.py
index dd7178e1..f6e67916 100644
--- a/pyfv3/stencils/updatedzc.py
+++ b/pyfv3/stencils/updatedzc.py
@@ -1,133 +1,148 @@
-import ndsl.constants as constants
-from ndsl import Quantity, QuantityFactory, StencilFactory
+from ndsl import NDSLRuntime, Quantity, QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_DIM
-from ndsl.dsl.gt4py import BACKWARD, FORWARD, PARALLEL, computation
-from ndsl.dsl.gt4py import function as gtfunction
-from ndsl.dsl.gt4py import interval
+from ndsl.dsl.gt4py import BACKWARD, FORWARD, PARALLEL, computation, interval
 from ndsl.dsl.typing import Float, FloatField, FloatFieldIJ, FloatFieldK
 from ndsl.stencils import corners
 
 
-DZ_MIN = constants.DZ_MIN
-
+def double_copy(q_in: FloatField, copy_1: FloatField, copy_2: FloatField):
+    with computation(PARALLEL), interval(...):
+        copy_1 = q_in
+        copy_2 = q_in
 
-@gtfunction
-def p_weighted_average_top(vel, dp0):
-    # TODO: ratio is a constant, where should this be placed?
-    ratio = dp0 / (dp0 + dp0[1])
-    return vel + (vel - vel[0, 0, 1]) * ratio
 
+def copy(q_in: FloatField, q_copy: FloatField):
+    with computation(PARALLEL), interval(...):
+        q_copy = q_in
 
-@gtfunction
-def p_weighted_average_bottom(vel, dp0):
-    ratio = dp0[-1] / (dp0[-2] + dp0[-1])
-    return vel[0, 0, -1] + (vel[0, 0, -1] - vel[0, 0, -2]) * ratio
 
+def compute_weighted_average(
+    dp_ref: FloatFieldK,
+    vel: FloatField,
+    avg: FloatField,
+):
+    """
+    Perform a cubic spline interpolation of wind velocity from grid center to grid edge
 
-@gtfunction
-def p_weighted_average_domain(vel, dp0):
-    int_ratio = 1.0 / (dp0[-1] + dp0)
-    return (dp0 * vel[0, 0, -1] + dp0[-1] * vel) * int_ratio
+    Args:
+        dp_ref(in): layer thickness in Pa
+        vel(in): grid center wind speed
+        avg(out: interpolated (grid edge) wind speed
+    """
+    # there's some complexity due to gz being defined on interfaces
+    # have to interpolate winds to layer interfaces first, using higher-order
+    with computation(PARALLEL):
+        with interval(0, 1):
+            top_ratio = dp_ref / (dp_ref + dp_ref[1])
+            avg = vel + (vel - vel[0, 0, 1]) * top_ratio
+        with interval(1, -1):
+            int_ratio = 1.0 / (dp_ref[-1] + dp_ref)
+            avg = (dp_ref * vel[0, 0, -1] + dp_ref[-1] * vel) * int_ratio
+        with interval(-1, None):
+            bot_ratio = dp_ref[-1] / (dp_ref[-2] + dp_ref[-1])
+            avg = vel[0, 0, -1] + (vel[0, 0, -1] - vel[0, 0, -2]) * bot_ratio
 
 
-@gtfunction
-def xy_flux(gz_x, gz_y, xfx, yfx):
+def compute_fx_fy(
+    gz_x: FloatField,
+    gz_y: FloatField,
+    xfx: FloatField,
+    yfx: FloatField,
+    fx: FloatField,
+    fy: FloatField,
+):
     """
     Compute first-order upwind fluxes of gz in x and y directions.
 
     Args:
-        gz_x: gz with corners copied to perform derivatives in x-direction
-        gz_y: gz with corners copied to perform derivatives in y-direction
-        xfx (out): contravariant c-grid u-wind interpolated to layer interfaces,
+        gz_x(in): gz with corners copied to perform derivatives in x-direction
+        gz_y(in): gz with corners copied to perform derivatives in y-direction
+        xfx(in): contravariant c-grid u-wind interpolated to layer interfaces,
             including metric terms to make it a "volume flux"
-        yfx (out): contravariant c-grid v-wind interpolated to layer interfaces
-
-    Returns:
-        fx: first-order upwind x-flux of gz
-        fy: first-order upwind y-flux of gz
+        yfx(in): contravariant c-grid v-wind interpolated to layer interfaces
+        fx(out): first-order upwind x-flux of gz
+        fy(out): first-order upwind y-flux of gz
     """
-    fx = xfx * (gz_x[-1, 0, 0] if xfx > 0.0 else gz_x)
-    fy = yfx * (gz_y[0, -1, 0] if yfx > 0.0 else gz_y)
-    return fx, fy
-
 
-def double_copy(q_in: FloatField, copy_1: FloatField, copy_2: FloatField):
     with computation(PARALLEL), interval(...):
-        copy_1 = q_in
-        copy_2 = q_in
+        if xfx > 0.0:
+            fx = gz_x[-1, 0, 0]
+        else:
+            fx = gz_x
+        fx = xfx * fx
 
+        if yfx > 0.0:
+            fy = gz_y[0, -1, 0]
+        else:
+            fy = gz_y
+        fy = yfx * fy
 
-def update_dz_c(
-    dp_ref: FloatFieldK,
-    zs: FloatFieldIJ,
-    area: FloatFieldIJ,
-    ut: FloatField,
-    vt: FloatField,
-    gz: FloatField,
-    gz_x: FloatField,
+
+def compute_gz_ws(
     gz_y: FloatField,
-    ws: FloatFieldIJ,
-    *,
+    area: FloatFieldIJ,
+    fx: FloatField,
+    fy: FloatField,
+    xfx: FloatField,
+    yfx: FloatField,
+    dz_min: Float,
     dt: Float,
+    zs: FloatFieldIJ,
+    ws: FloatFieldIJ,
+    gz: FloatField,
 ):
     """
-    Step dz forward on c-grid
-    Ensures gz is monotonically increasing in z at the end
-    Args:
-        dp_ref:
-        zs:
-        area:
-        ut:
-        vt:
-        gz:
-        gz_x: gz with corners copied to perform derivatives in x-direction
-        gz_y: gz with corners copied to perform derivatives in y-direction
-        ws: lagrangian (parcel-following) surface vertical wind implied by
+        Compute gz and wd, eusures gz is monotonically increasing in z at the end
+
+    Args
+        gz_y(in): gz with corners copied to perform derivatives in y-direction
+        area(in):
+        fx(in): first-order upwind x-flux of gz
+        fy(in): first-order upwind y-flux of gz
+        xfx(in): contravariant c-grid u-wind interpolated to layer interfaces,
+            including metric terms to make it a "volume flux"
+        yfx(in): contravariant c-grid v-wind interpolated to layer interfaces
+        dz_min(in): Controls minimum thickness in NH solver
+        dt(in): timestep over which to evolve the geopotential height, in seconds
+        zs(in): surface height in m
+        ws(out): lagrangian (parcel-following) surface vertical wind implied by
             lowest-level gz change note that a parcel moving horizontally
             across terrain will be moving in the vertical (eqn 5.5 in documentation)
-        dt:
+        gz(out): geopotential height on model interfaces
     """
 
-    # there's some complexity due to gz being defined on interfaces
-    # have to interpolate winds to layer interfaces first, using higher-order
-    # cubic spline interpolation
-    with computation(PARALLEL):
-        with interval(0, 1):
-            # TODO: inline some or all of these functions
-            xfx = p_weighted_average_top(ut, dp_ref)
-            yfx = p_weighted_average_top(vt, dp_ref)
-        with interval(1, -1):
-            xfx = p_weighted_average_domain(ut, dp_ref)
-            yfx = p_weighted_average_domain(vt, dp_ref)
-        with interval(-1, None):
-            xfx = p_weighted_average_bottom(ut, dp_ref)
-            yfx = p_weighted_average_bottom(vt, dp_ref)
-    # xfx/yfx are now ut/vt interpolated to layer interfaces
     with computation(PARALLEL), interval(...):
-        fx, fy = xy_flux(gz_x, gz_y, xfx, yfx)
-        gz = (gz * area + (fx - fx[1, 0, 0]) + (fy - fy[0, 1, 0])) / (
+        gz = (gz_y * area + (fx - fx[1, 0, 0]) + (fy - fy[0, 1, 0])) / (
             area + (xfx - xfx[1, 0, 0]) + (yfx - yfx[0, 1, 0])
         )
-    with computation(FORWARD), interval(-1, None):
+    with computation(FORWARD), interval(...):
         rdt = 1.0 / dt
         ws = (zs - gz) * rdt
     with computation(BACKWARD), interval(0, -1):
-        gz_kp1 = gz[0, 0, 1] + DZ_MIN
+        gz_kp1 = gz[0, 0, 1] + dz_min
         gz = gz if gz > gz_kp1 else gz_kp1
 
 
-class UpdateGeopotentialHeightOnCGrid:
+class UpdateGeopotentialHeightOnCGrid(NDSLRuntime):
     def __init__(
         self,
         stencil_factory: StencilFactory,
         quantity_factory: QuantityFactory,
         area: Quantity,
         dp_ref: Quantity,
-        grid_type,
+        grid_type: int,
+        dz_min: Float,
     ):
+        """
+        Args:
+            dz_min: controls minimum thickness in NH solver
+        """
+        super().__init__(stencil_factory)
+
         grid_indexing = stencil_factory.grid_indexing
         self._area = area
         self._grid_type = grid_type
+        self._dz_min = dz_min
         # TODO: this is needed because GridData.dp_ref does not have access
         # to a QuantityFactory, we should add a way to perform operations on
         # Quantity and persist the QuantityFactory choices
@@ -139,17 +154,26 @@ def __init__(
             units=dp_ref.units,
             dtype=Float,
         )
-        self._dp_ref.view[:] = dp_ref.view[:]
-        self._gz_x = quantity_factory.zeros(
+        self._dp_ref.field[:] = dp_ref.field[:]
+        self._gz_x = self.make_local(
+            quantity_factory,
             [I_DIM, J_DIM, K_DIM],
             units="m**2/s**2",
-            dtype=Float,
         )
-        self._gz_y = quantity_factory.zeros(
+        self._gz_y = self.make_local(
+            quantity_factory,
             [I_DIM, J_DIM, K_DIM],
             units="m**2/s**2",
-            dtype=Float,
         )
+        self._gz_filled = self.make_local(
+            quantity_factory,
+            [I_DIM, J_DIM, K_DIM],
+            units="m**2/s**2",
+        )
+        self._xfx = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._yfx = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._fx = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
+        self._fy = self.make_local(quantity_factory, [I_DIM, J_DIM, K_DIM])
         full_origin = grid_indexing.origin_full()
         full_domain = grid_indexing.domain_full(add=(0, 0, 1))
         self._double_copy_stencil = stencil_factory.from_origin_domain(
@@ -157,6 +181,11 @@ def __init__(
             origin=full_origin,
             domain=full_domain,
         )
+        self._copy_stencil = stencil_factory.from_origin_domain(
+            copy,
+            origin=full_origin,
+            domain=full_domain,
+        )
 
         ax_offsets = grid_indexing.axis_offsets(full_origin, full_domain)
 
@@ -174,12 +203,26 @@ def __init__(
                 domain=full_domain,
             )
 
-        self._update_dz_c = stencil_factory.from_origin_domain(
-            update_dz_c,
+        self._compute_weighted_average = stencil_factory.from_origin_domain(
+            compute_weighted_average,
+            origin=grid_indexing.origin_compute(add=(-1, -1, 0)),
+            domain=grid_indexing.domain_compute(add=(3, 3, 1)),
+        )
+
+        self._compute_flux = stencil_factory.from_origin_domain(
+            compute_fx_fy,
+            origin=grid_indexing.origin_compute(add=(-1, -1, 0)),
+            domain=grid_indexing.domain_compute(add=(3, 3, 1)),
+        )
+
+        self._compute_gz_ws = stencil_factory.from_origin_domain(
+            compute_gz_ws,
             origin=grid_indexing.origin_compute(add=(-1, -1, 0)),
             domain=grid_indexing.domain_compute(add=(2, 2, 1)),
         )
 
+        self.DEBUG_VAR_1 = quantity_factory.zeros([I_DIM, J_DIM, K_DIM], "n/a")
+
     def __call__(
         self,
         zs: FloatFieldIJ,
@@ -190,6 +233,8 @@ def __call__(
         dt: Float,
     ):
         """
+        Step dz forward on c-grid
+
         Args:
             dp_ref: layer thickness in Pa
             zs: surface height in m
@@ -205,20 +250,32 @@ def __call__(
 
         self._double_copy_stencil(gz, self._gz_x, self._gz_y)
 
-        # TODO(eddied): We pass the same fields 2x to avoid GTC validation errors
         if self._grid_type < 3:
             self._fill_corners_x_stencil(self._gz_x, self._gz_x)
             self._fill_corners_y_stencil(self._gz_y, self._gz_y)
 
-        self._update_dz_c(
-            self._dp_ref,
-            zs,
-            self._area,
-            ut,
-            vt,
-            gz,
-            self._gz_x,
-            self._gz_y,
-            ws,
+        self._compute_weighted_average(dp_ref=self._dp_ref, vel=ut, avg=self._xfx)
+        self._compute_weighted_average(dp_ref=self._dp_ref, vel=vt, avg=self._yfx)
+
+        self._compute_flux(
+            gz_x=self._gz_x,
+            gz_y=self._gz_y,
+            xfx=self._xfx,
+            yfx=self._yfx,
+            fx=self._fx,
+            fy=self._fy,
+        )
+
+        self._compute_gz_ws(
+            gz_y=self._gz_y,
+            area=self._area,
+            fx=self._fx,
+            fy=self._fy,
+            xfx=self._xfx,
+            yfx=self._yfx,
+            dz_min=self._dz_min,
             dt=dt,
+            zs=zs,
+            ws=ws,
+            gz=gz,
         )
diff --git a/pyfv3/stencils/updatedzd.py b/pyfv3/stencils/updatedzd.py
index 6edc0294..98746c72 100644
--- a/pyfv3/stencils/updatedzd.py
+++ b/pyfv3/stencils/updatedzd.py
@@ -1,5 +1,4 @@
-import ndsl.constants as constants
-from ndsl import Quantity, QuantityFactory, StencilFactory, orchestrate
+from ndsl import NDSLRuntime, Quantity, QuantityFactory, StencilFactory
 from ndsl.constants import (
     I_DIM,
     I_INTERFACE_DIM,
@@ -17,9 +16,6 @@
 from pyfv3.stencils.fvtp2d import FiniteVolumeTransport
 
 
-DZ_MIN = constants.DZ_MIN
-
-
 @gtfunction
 def _apply_height_advective_flux(
     height: FloatField,
@@ -73,6 +69,7 @@ def apply_height_fluxes(
     surface_height: FloatFieldIJ,
     ws: FloatFieldIJ,
     dt: Float,
+    dz_min: Float,
 ):
     """
     Apply all computed fluxes to height profile.
@@ -96,6 +93,7 @@ def apply_height_fluxes(
         surface_height (in): surface height
         ws (out): vertical velocity of the lowest level (to keep it at the surface)
         dt (in): acoustic timestep (seconds)
+        dz_min(in): controls minimum thickness in NH solver
     Grid variable inputs:
         area
     """
@@ -111,10 +109,11 @@ def apply_height_fluxes(
 
     with computation(BACKWARD):
         with interval(-1, None):
-            ws = (surface_height - height) / dt
+            rdt = 1 / dt
+            ws = (surface_height - height) * rdt
         with interval(0, -1):
             # ensure layer thickness exceeds minimum
-            other = height[0, 0, 1] + DZ_MIN
+            other = height[0, 0, 1] + dz_min
             height = height if height > other else other
 
 
@@ -200,7 +199,7 @@ def cubic_spline_interpolation_from_layer_center_to_interfaces(
         q_interface -= gamma * q_interface[0, 0, 1]
 
 
-class UpdateHeightOnDGrid:
+class UpdateHeightOnDGrid(NDSLRuntime):
     """
     Fortran name is updatedzd.
     """
@@ -213,12 +212,22 @@ def __init__(
         grid_data: GridData,
         grid_type: int,
         hord_tm: int,
+        dz_min: Float,
         column_namelist,
     ):
-        orchestrate(
-            obj=self,
-            config=stencil_factory.config.dace_config,
-        )
+        """
+        Args:
+            stencil_factory
+            quantity_factory
+            damping_coefficients
+            grid_data
+            grid_type
+            hord_tm
+            dz_min (in): controls minimum thickness in NH solver
+            column_namelist
+        """
+        super().__init__(stencil_factory)
+
         grid_indexing = stencil_factory.grid_indexing
         self.grid_indexing = grid_indexing
         self._area = grid_data.area
@@ -227,8 +236,9 @@ def __init__(
             raise NotImplementedError(
                 "damp <= 1e-5 in column_namelist is not implemented"
             )
+        self._dz_min = dz_min
         self._dp_ref = grid_data.dp_ref
-        self._allocate_temporary_storages(quantity_factory)
+        self._make_locals(quantity_factory)
         self._gk, self._beta, self._gamma = cubic_spline_interpolation_constants(
             dp0=grid_data.dp_ref, quantity_factory=quantity_factory
         )
@@ -259,51 +269,37 @@ def __init__(
             domain=grid_indexing.domain_compute(add=(0, 0, 1)),
         )
 
-    def _allocate_temporary_storages(self, quantity_factory: QuantityFactory):
-        self._crx_interface = quantity_factory.zeros(
-            [I_INTERFACE_DIM, J_DIM, K_INTERFACE_DIM],
-            "",
-            dtype=Float,
+    def _make_locals(self, quantity_factory: QuantityFactory):
+        """Allocate all Locals on `self`"""
+
+        self._crx_interface = self.make_local(
+            quantity_factory, [I_INTERFACE_DIM, J_DIM, K_INTERFACE_DIM]
         )
-        self._cry_interface = quantity_factory.zeros(
-            [I_DIM, J_INTERFACE_DIM, K_INTERFACE_DIM],
-            "",
-            dtype=Float,
+        self._cry_interface = self.make_local(
+            quantity_factory, [I_DIM, J_INTERFACE_DIM, K_INTERFACE_DIM]
         )
-        self._x_area_flux_interface = quantity_factory.zeros(
+        self._x_area_flux_interface = self.make_local(
+            quantity_factory,
             [I_INTERFACE_DIM, J_DIM, K_INTERFACE_DIM],
-            "m^2",
-            dtype=Float,
+            units="m^2",
         )
-        self._y_area_flux_interface = quantity_factory.zeros(
+        self._y_area_flux_interface = self.make_local(
+            quantity_factory,
             [I_DIM, J_INTERFACE_DIM, K_INTERFACE_DIM],
-            "m^2",
-            dtype=Float,
+            units="m^2",
         )
-        self._wk = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_INTERFACE_DIM],
-            "unknown",
-            dtype=Float,
+        self._wk = self.make_local(quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM])
+        self._height_x_diffusive_flux = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM]
         )
-        self._height_x_diffusive_flux = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_INTERFACE_DIM],
-            "unknown",
-            dtype=Float,
+        self._height_y_diffusive_flux = self.make_local(
+            quantity_factory, [I_DIM, J_DIM, K_INTERFACE_DIM]
         )
-        self._height_y_diffusive_flux = quantity_factory.zeros(
-            [I_DIM, J_DIM, K_INTERFACE_DIM],
-            "unknown",
-            dtype=Float,
+        self._fx = self.make_local(
+            quantity_factory, [I_INTERFACE_DIM, J_DIM, K_INTERFACE_DIM]
         )
-        self._fx = quantity_factory.zeros(
-            [I_INTERFACE_DIM, J_DIM, K_INTERFACE_DIM],
-            "unknown",
-            dtype=Float,
-        )
-        self._fy = quantity_factory.zeros(
-            [I_DIM, J_INTERFACE_DIM, K_INTERFACE_DIM],
-            "unknown",
-            dtype=Float,
+        self._fy = self.make_local(
+            quantity_factory, [I_DIM, J_INTERFACE_DIM, K_INTERFACE_DIM]
         )
 
     def __call__(
@@ -379,4 +375,5 @@ def __call__(
             surface_height,
             ws,
             dt,
+            self._dz_min,
         )
diff --git a/pyfv3/stencils/w_fix_consrv_moment.py b/pyfv3/stencils/w_fix_consrv_moment.py
new file mode 100644
index 00000000..26d5c3e1
--- /dev/null
+++ b/pyfv3/stencils/w_fix_consrv_moment.py
@@ -0,0 +1,83 @@
+from gt4py.cartesian.gtscript import BACKWARD, FORWARD, PARALLEL, computation, interval
+
+from ndsl.dsl.typing import BoolFieldIJ, Float, FloatField, FloatFieldIJ
+
+
+def W_fix_consrv_moment(
+    w: FloatField,
+    w2: FloatField,
+    dp2: FloatField,
+    gz: FloatFieldIJ,
+    w_max: Float,
+    w_min: Float,
+    compute_performed: BoolFieldIJ,
+):
+    """
+    Args:
+        w (in/out):
+        w2 (in?):
+        dp2(in):
+        w_max(in):
+        w_min(in):
+        compute_performed: (Internal Temporary),
+    """
+
+    with computation(PARALLEL), interval(...):
+        w2 = w
+
+    with computation(FORWARD):
+        with interval(0, 1):
+            compute_performed = False
+            if w2 > w_max:
+                gz = (w2 - w_max) * dp2
+                w2 = w_max
+                compute_performed = True
+            elif w2 < w_min:
+                gz = (w2 - w_min) * dp2
+                w2 = w_min
+                compute_performed = True
+        with interval(1, -1):
+            if compute_performed:
+                w2 = w2 + gz / dp2
+                compute_performed = False
+            if w2 > w_max:
+                gz = (w2 - w_max) * dp2
+                w2 = w_max
+                compute_performed = True
+            elif w2 < w_min:
+                gz = (w2 - w_min) * dp2
+                w2 = w_min
+                compute_performed = True
+
+    with computation(BACKWARD):
+        with interval(-1, None):
+            compute_performed = False
+            if w2 > w_max:
+                gz = (w2 - w_max) * dp2
+                w2 = w_max
+                compute_performed = True
+            elif w2 < w_min:
+                gz = (w2 - w_min) * dp2
+                w2 = w_min
+                compute_performed = True
+        with interval(1, -1):
+            if compute_performed:
+                w2 = w2 + gz / dp2
+                compute_performed = False
+            if w2 > w_max:
+                gz = (w2 - w_max) * dp2
+                w2 = w_max
+                compute_performed = True
+            elif w2 < w_min:
+                gz = (w2 - w_min) * dp2
+                w2 = w_min
+                compute_performed = True
+
+    with computation(FORWARD), interval(0, 1):
+        if w2 > (w_max * 2.0):
+            w2 = w_max * 2.0
+        elif w2 < (w_min * 2.0):
+            w2 = w_min * 2.0
+
+    with computation(PARALLEL), interval(...):
+        w = w2
diff --git a/pyfv3/stencils/xppm.py b/pyfv3/stencils/xppm.py
index 4b9d85c1..7d2911c2 100644
--- a/pyfv3/stencils/xppm.py
+++ b/pyfv3/stencils/xppm.py
@@ -1,4 +1,4 @@
-from ndsl import StencilFactory, orchestrate
+from ndsl import NDSLRuntime, StencilFactory
 from ndsl.dsl.gt4py import PARALLEL, compile_assert, computation
 from ndsl.dsl.gt4py import function as gtfunction
 from ndsl.dsl.gt4py import horizontal, interval, region
@@ -41,12 +41,19 @@ def fx1_fn(courant, br, b0, bl):
 
 @gtfunction
 def get_advection_mask(bl, b0, br):
-    from __externals__ import mord
+    from __externals__ import i_end, i_start, mord
 
     if __INLINED(mord == 5):
         smt5 = bl * br < 0
+    elif __INLINED(mord == -5):
+        compile_assert(False)
     else:
         smt5 = (3.0 * abs(b0)) < abs(bl - br)
+        # Fix edge issues
+        with horizontal(region[i_start - 1, :], region[i_start, :]):
+            smt5 = bl * br < 0.0
+        with horizontal(region[i_end, :], region[i_end + 1, :]):
+            smt5 = bl * br < 0.0
 
     if smt5[-1, 0, 0] or smt5[0, 0, 0]:
         advection_mask = 1.0
@@ -157,10 +164,6 @@ def compute_al(q: FloatField, dxa: FloatFieldIJ):
 
     al = ppm.p1 * (q[-1, 0, 0] + q) + ppm.p2 * (q[-2, 0, 0] + q[1, 0, 0])
 
-    if __INLINED(iord < 0):
-        compile_assert(False)
-        al = max(al, 0.0)
-
     if __INLINED(grid_type < 3):
         with horizontal(region[i_start - 1, :], region[i_end, :]):
             al = ppm.c1 * q[-2, 0, 0] + ppm.c2 * q[-1, 0, 0] + ppm.c3 * q
@@ -177,6 +180,9 @@ def compute_al(q: FloatField, dxa: FloatFieldIJ):
         with horizontal(region[i_start + 1, :], region[i_end + 2, :]):
             al = ppm.c3 * q[-1, 0, 0] + ppm.c2 * q[0, 0, 0] + ppm.c1 * q[1, 0, 0]
 
+    if __INLINED(iord < 0):
+        al = max(al, 0.0)
+
     return al
 
 
@@ -268,7 +274,10 @@ def compute_blbr_ord8plus(q: FloatField, dxa: FloatFieldIJ):
 
 
 def compute_x_flux(
-    q: FloatField, courant: FloatField, dxa: FloatFieldIJ, xflux: FloatField
+    q: FloatField,
+    courant: FloatField,
+    dxa: FloatFieldIJ,
+    xflux: FloatField,
 ):
     """
     Args:
@@ -288,9 +297,32 @@ def compute_x_flux(
             xflux = get_flux_ord8plus(q, courant, bl, br)
 
 
-class XPiecewiseParabolic:
+class XPiecewiseParabolic(NDSLRuntime):
     """
     Fortran name is xppm
+
+    `iord` is `hord_dp` which is hord for `δp`, `δz`, where:
+
+    `δp`: Total air mass (including vapor and condensates)
+        Equal to hydrostatic pressure depth of layer
+    `δz`: Geometric layer depth (nonhydrostatic)
+
+    Value explainers:
+        5: Unlimited “fifth-order” scheme with weak 2∆x filter; fastest
+            and least diffusive (“inviscid”)
+        6: Intermediate-strength 2∆x filter. Gives best ACC and storm
+            structure but weaker TCs (“minimally-diffusive”)
+        8: Lin 2004 monotone PPM constraint (“monotonic”)
+        9: Hunyh constraint: more expensive but less diffusive than #8
+        -5: #5 with a positive-definite constraint
+
+    Undocumented values implemented in Fortran: 7, 10, 11, 12, 13.
+
+    The code below is capable of:
+        - FV3-sphere grid (no single-tile periodic grid)
+        - `iord` == 8 for monotonic behaviors OR
+        - `iord` 5, 6
+        - `iord` must be positive
     """
 
     def __init__(
@@ -302,25 +334,30 @@ def __init__(
         origin: Index3D,
         domain: Index3D,
     ):
-        orchestrate(obj=self, config=stencil_factory.config.dace_config)
+        # Dev note: this could be rewrote to split monotonic and not, or per-type of
+        #           scheme as described above with compiler-time `iord` conditional to
+        #           direct the code
+
+        super().__init__(stencil_factory)
+
         # Arguments come from:
         # namelist.grid_type
         # grid.dxa
-        if grid_type == 3 or grid_type > 4:
-            raise NotImplementedError(
-                "X Piecewise Parabolic (xppm): "
-                f" grid type {grid_type} not implemented. <3 or 4 available."
-            )
 
-        if abs(iord) >= 8 and iord != 8:
+        available_grid_options = [0, 4]
+        if grid_type not in available_grid_options:
             raise NotImplementedError(
-                "X Piecewise Parabolic (xppm): "
-                f"iord {iord} != 8 not implemented when >= 8."
+                "Y Piecewise Parabolic (yppm) configuration: "
+                f"grid type {grid_type} not implemented. "
+                f"Options are {available_grid_options}."
             )
 
-        if iord < 0:
+        available_iords = [-6, 5, 6, 8]
+        if iord not in available_iords:
             raise NotImplementedError(
-                f"X Piecewise Parabolic (xppm): iord {iord} < 0 not implemented."
+                "Y Piecewise Parabolic (yppm) configuration: "
+                f"iord {iord} not implemented. "
+                f"Options are {available_iords}."
             )
 
         self._dxa = dxa
@@ -368,7 +405,10 @@ def __call__(
         # were called "get_flux", while the routine which got the flux was called
         # fx1_fn. The final value was called xflux instead of q_out.
         self._compute_flux_stencil(
-            q_in, c, self._dxa, q_mean_advected_through_x_interface
+            q_in,
+            c,
+            self._dxa,
+            q_mean_advected_through_x_interface,
         )
         # bl and br are "edge perturbation values" as in equation 4.1
         # of the FV3 documentation
diff --git a/pyfv3/stencils/xtp_u.py b/pyfv3/stencils/xtp_u.py
index 8e36a89b..58eab1ed 100644
--- a/pyfv3/stencils/xtp_u.py
+++ b/pyfv3/stencils/xtp_u.py
@@ -88,7 +88,7 @@ def advect_u_along_x(
 
     bl, br = get_bl_br(u, dx, dxa)
     b0 = bl + br
-    cfl = ub_contra * dt * rdx[-1, 0] if ub_contra > 0 else ub_contra * dt * rdx
+    cfl = ub_contra * rdx[-1, 0] if ub_contra > 0 else ub_contra * rdx
     fx0 = xppm.fx1_fn(cfl, br, b0, bl)
 
     if __INLINED(iord < 8):
diff --git a/pyfv3/stencils/yppm.py b/pyfv3/stencils/yppm.py
index 5174c883..11a28562 100644
--- a/pyfv3/stencils/yppm.py
+++ b/pyfv3/stencils/yppm.py
@@ -1,4 +1,4 @@
-from ndsl import StencilFactory, orchestrate
+from ndsl import NDSLRuntime, StencilFactory
 from ndsl.dsl.gt4py import PARALLEL, compile_assert, computation
 from ndsl.dsl.gt4py import function as gtfunction
 from ndsl.dsl.gt4py import horizontal, interval, region
@@ -41,12 +41,19 @@ def fx1_fn(courant, br, b0, bl):
 
 @gtfunction
 def get_advection_mask(bl, b0, br):
-    from __externals__ import mord
+    from __externals__ import j_end, j_start, mord
 
     if __INLINED(mord == 5):
         smt5 = bl * br < 0
+    elif __INLINED(mord == -5):
+        compile_assert(False)
     else:
         smt5 = (3.0 * abs(b0)) < abs(bl - br)
+        # Fix edge issues
+        with horizontal(region[:, j_start - 1], region[:, j_start]):
+            smt5 = bl * br < 0.0
+        with horizontal(region[:, j_end], region[:, j_end + 1]):
+            smt5 = bl * br < 0.0
 
     if smt5[0, -1, 0] or smt5[0, 0, 0]:
         advection_mask = 1.0
@@ -157,10 +164,6 @@ def compute_al(q: FloatField, dya: FloatFieldIJ):
 
     al = ppm.p1 * (q[0, -1, 0] + q) + ppm.p2 * (q[0, -2, 0] + q[0, 1, 0])
 
-    if __INLINED(jord < 0):
-        compile_assert(False)
-        al = max(al, 0.0)
-
     if __INLINED(grid_type < 3):
         with horizontal(region[:, j_start - 1], region[:, j_end]):
             al = ppm.c1 * q[0, -2, 0] + ppm.c2 * q[0, -1, 0] + ppm.c3 * q
@@ -177,6 +180,9 @@ def compute_al(q: FloatField, dya: FloatFieldIJ):
         with horizontal(region[:, j_start + 1], region[:, j_end + 2]):
             al = ppm.c3 * q[0, -1, 0] + ppm.c2 * q[0, 0, 0] + ppm.c1 * q[0, 1, 0]
 
+    if __INLINED(jord < 0):
+        al = max(al, 0.0)
+
     return al
 
 
@@ -268,7 +274,10 @@ def compute_blbr_ord8plus(q: FloatField, dya: FloatFieldIJ):
 
 
 def compute_y_flux(
-    q: FloatField, courant: FloatField, dya: FloatFieldIJ, yflux: FloatField
+    q: FloatField,
+    courant: FloatField,
+    dya: FloatFieldIJ,
+    yflux: FloatField,
 ):
     """
     Args:
@@ -288,9 +297,32 @@ def compute_y_flux(
             yflux = get_flux_ord8plus(q, courant, bl, br)
 
 
-class YPiecewiseParabolic:
+class YPiecewiseParabolic(NDSLRuntime):
     """
     Fortran name is yppm
+
+    `jord` is `hord_dp` which is hord for `δp`, `δz`, where:
+
+    `δp`: Total air mass (including vapor and condensates)
+        Equal to hydrostatic pressure depth of layer
+    `δz`: Geometric layer depth (nonhydrostatic)
+
+    Value explainers:
+        5: Unlimited “fifth-order” scheme with weak 2∆x filter; fastest
+            and least diffusive (“inviscid”)
+        6: Intermediate-strength 2∆x filter. Gives best ACC and storm
+            structure but weaker TCs (“minimally-diffusive”)
+        8: Lin 2004 monotone PPM constraint (“monotonic”)
+        9: Hunyh constraint: more expensive but less diffusive than #8
+        -5: #5 with a positive-definite constraint
+
+    Undocumented values implemented in Fortran: 7, 10, 11, 12, 13.
+
+    The code below is capable of:
+        - FV3-sphere grid (no single-tile periodic grid)
+        - `jord` == 8 for monotonic behaviors OR
+        - `jord` 5, 6
+        - `jord` must be positive
     """
 
     def __init__(
@@ -302,25 +334,30 @@ def __init__(
         origin: Index3D,
         domain: Index3D,
     ):
-        orchestrate(obj=self, config=stencil_factory.config.dace_config)
+        # Dev note: this could be rewrote to split monotonic and not, or per-type of
+        #           scheme as described above with compiler-time `jord` conditional to
+        #           direct the code
+
+        super().__init__(stencil_factory)
+
         # Arguments come from:
         # namelist.grid_type
         # grid.dya
-        if grid_type == 3 or grid_type > 4:
-            raise NotImplementedError(
-                "Y Piecewise Parabolic (yppm): "
-                f" grid type {grid_type} not implemented. <3 or 4 available."
-            )
 
-        if abs(jord) >= 8 and jord != 8:
+        available_grid_options = [0, 4]
+        if grid_type not in available_grid_options:
             raise NotImplementedError(
-                "Y Piecewise Parabolic (yppm): "
-                f"jord {jord} != 8 not implemented when >= 8."
+                "Y Piecewise Parabolic (yppm) configuration: "
+                f"grid type {grid_type} not implemented. "
+                f"Options are {available_grid_options}."
             )
 
-        if jord < 0:
+        available_jords = [-6, 5, 6, 8]
+        if jord not in available_jords:
             raise NotImplementedError(
-                f"Y Piecewise Parabolic (yppm): jord {jord} < 0 not implemented."
+                "Y Piecewise Parabolic (yppm) configuration: "
+                f"jord {jord} not implemented. "
+                f"Options are {available_jords}."
             )
 
         self._dya = dya
@@ -368,7 +405,10 @@ def __call__(
         # were called "get_flux", while the routine which got the flux was called
         # fx1_fn. The final value was called yflux instead of q_out.
         self._compute_flux_stencil(
-            q_in, c, self._dya, q_mean_advected_through_y_interface
+            q_in,
+            c,
+            self._dya,
+            q_mean_advected_through_y_interface,
         )
         # bl and br are "edge perturbation values" as in equation 4.1
         # of the FV3 documentation
diff --git a/pyfv3/stencils/ytp_v.py b/pyfv3/stencils/ytp_v.py
index 3077ea47..b2b8e67a 100644
--- a/pyfv3/stencils/ytp_v.py
+++ b/pyfv3/stencils/ytp_v.py
@@ -88,7 +88,7 @@ def advect_v_along_y(
 
     bl, br = get_bl_br(v, dy, dya)
     b0 = bl + br
-    cfl = vb_contra * dt * rdy[0, -1] if vb_contra > 0 else vb_contra * dt * rdy
+    cfl = vb_contra * rdy[0, -1] if vb_contra > 0 else vb_contra * rdy
     fx0 = yppm.fx1_fn(cfl, br, b0, bl)
 
     if __INLINED(jord < 8):
diff --git a/pyfv3/testing/map_single.py b/pyfv3/testing/map_single.py
index a9b537f3..1e8441ec 100644
--- a/pyfv3/testing/map_single.py
+++ b/pyfv3/testing/map_single.py
@@ -27,6 +27,10 @@ def __call__(
         key_tuple = (kord, mode, (I_INTERFACE_DIM, J_INTERFACE_DIM, K_DIM))
         if key_tuple not in self._object_pool:
             self._object_pool[key_tuple] = MapSingle(
-                self.stencil_factory, self.quantity_factory, *key_tuple
+                self.stencil_factory,
+                self.quantity_factory,
+                key_tuple[0],
+                key_tuple[1],
+                list(key_tuple[2]),
             )
         return self._object_pool[key_tuple](*args, **kwargs)
diff --git a/pyfv3/testing/translate_dyncore.py b/pyfv3/testing/translate_dyncore.py
index 0c6bb21e..b5ecb958 100644
--- a/pyfv3/testing/translate_dyncore.py
+++ b/pyfv3/testing/translate_dyncore.py
@@ -8,6 +8,7 @@
 from pyfv3._config import DynamicalCoreConfig
 from pyfv3.dycore_state import DycoreState
 from pyfv3.stencils import dyn_core
+from pyfv3.tracers import default_GEOS_tracers
 
 
 class TranslateDynCore(ParallelTranslate2PyState):
@@ -104,6 +105,7 @@ def __init__(
             "ak": {},
             "bk": {},
             "diss_estd": {},
+            "dpx": grid.compute_dict(),
         }
         self._base.in_vars["data_vars"]["wsd"]["kstart"] = grid.npz
         self._base.in_vars["data_vars"]["wsd"]["kend"] = None
@@ -127,6 +129,7 @@ def __init__(
         self.config = DynamicalCoreConfig.from_f90nml(namelist)
 
     def compute_parallel(self, inputs: dict, communicator: Communicator) -> dict:
+        default_GEOS_tracers(self.grid.quantity_factory)
         # ak, bk, and phis are numpy arrays at this point and
         #   must be converted into gt4py storages
         for name in ("ak", "bk", "phis"):
@@ -143,8 +146,16 @@ def compute_parallel(self, inputs: dict, communicator: Communicator) -> dict:
             grid_data.bk = inputs["bk"]
             grid_data.ptop = inputs["ptop"]
         self._base.make_storage_data_input_vars(inputs)
-        state = DycoreState.init_zeros(quantity_factory=self.grid.quantity_factory)
-        wsd: Quantity = self.grid.quantity_factory.zeros(
+        inputs_dtypes = {}
+        for k, v in inputs.items():
+            if hasattr(v, "dtype"):
+                inputs_dtypes[k] = v.dtype
+        state = DycoreState.init_zeros(
+            quantity_factory=self.grid.quantity_factory,
+            dtype_dict=inputs_dtypes,
+            allow_mismatch_float_precision=True,
+        )
+        wsd = self.grid.quantity_factory.zeros(
             dims=[I_DIM, J_DIM],
             units="unknown",
         )
@@ -156,11 +167,18 @@ def compute_parallel(self, inputs: dict, communicator: Communicator) -> dict:
                 state[name].data[selection] = value
             else:
                 setattr(state, name, value)
-        phis: Quantity = self.grid.quantity_factory.zeros(
+        phis = self.grid.quantity_factory.zeros(
             dims=[I_DIM, J_DIM],
             units="m",
         )
         phis.data[:] = phis.np.asarray(inputs["phis"])
+        dpx = self.grid.quantity_factory.zeros(
+            dims=[I_DIM, J_DIM, K_DIM],
+            units="unknown",
+            dtype=inputs_dtypes["dpx"],
+            allow_mismatch_float_precision=True,
+        )
+        dpx.data[:] = dpx.np.asarray(inputs["dpx"])
         acoustic_dynamics = dyn_core.AcousticDynamics(
             comm=communicator,
             stencil_factory=self.stencil_factory,
@@ -172,20 +190,30 @@ def compute_parallel(self, inputs: dict, communicator: Communicator) -> dict:
             stretched_grid=self.grid.stretched_grid,
             config=self.config.acoustic_dynamics,
             phis=phis,
-            wsd=wsd.data,
             state=state,
         )
         acoustic_dynamics.cappa.data[:] = inputs["cappa"][:]
 
-        acoustic_dynamics(state, timestep=inputs["mdt"], n_map=state.n_map)  # type: ignore[attr-defined]
+        acoustic_dynamics(
+            state,
+            mfxd=state.mfxd,
+            mfyd=state.mfyd,
+            cxd=state.cxd,
+            cyd=state.cyd,
+            dpx=dpx,
+            wsd=wsd,
+            timestep=inputs["mdt"],
+            n_map=inputs["n_map"],
+        )
         # the "inputs" dict is not used to return, we construct a new dict based
         # on variables attached to `state`
         storages_only = {}
         for name, value in vars(state).items():
             if isinstance(value, Quantity):
-                storages_only[name] = value.data
+                storages_only[name] = value[:]
             else:
                 storages_only[name] = value
-        storages_only["wsd"] = wsd.data
-        storages_only["cappa"] = acoustic_dynamics.cappa.data
+        storages_only["wsd"] = wsd[:]
+        storages_only["cappa"] = acoustic_dynamics.cappa[:]
+        storages_only["dpx"] = dpx[:]
         return self._base.slice_output(storages_only)
diff --git a/pyfv3/testing/translate_fvdynamics.py b/pyfv3/testing/translate_fvdynamics.py
index 0124a802..8eb785a9 100644
--- a/pyfv3/testing/translate_fvdynamics.py
+++ b/pyfv3/testing/translate_fvdynamics.py
@@ -2,6 +2,7 @@
 from datetime import timedelta
 from typing import Any
 
+import numpy as np
 import pytest
 from f90nml import Namelist
 
@@ -22,6 +23,7 @@
 from pyfv3._config import DynamicalCoreConfig
 from pyfv3.dycore_state import DycoreState
 from pyfv3.stencils import fv_dynamics
+from pyfv3.tracers import FVTracersAxisName, GEOS_tracers_mapping, setup_fvtracers
 
 
 class TranslateFVDynamics(ParallelTranslateBaseSlicing):
@@ -81,25 +83,25 @@ class TranslateFVDynamics(ParallelTranslateBaseSlicing):
             "dims": [I_DIM, K_INTERFACE_DIM, J_DIM],
             "n_halo": 0,
         },
-        "mfxd": {
+        "mfxd_FV": {
             "name": "accumulated_x_mass_flux",
             "dims": [I_INTERFACE_DIM, J_DIM, K_DIM],
             "units": "unknown",
             "n_halo": 0,
         },
-        "mfyd": {
+        "mfyd_FV": {
             "name": "accumulated_y_mass_flux",
             "dims": [I_DIM, J_INTERFACE_DIM, K_DIM],
             "units": "unknown",
             "n_halo": 0,
         },
-        "cxd": {
+        "cxd_FV": {
             "name": "accumulated_x_courant_number",
             "dims": [I_INTERFACE_DIM, J_DIM, K_DIM],
             "units": "",
             "n_halo": (0, 3),
         },
-        "cyd": {
+        "cyd_FV": {
             "name": "accumulated_y_courant_number",
             "dims": [I_DIM, J_INTERFACE_DIM, K_DIM],
             "units": "",
@@ -155,51 +157,6 @@ class TranslateFVDynamics(ParallelTranslateBaseSlicing):
             "units": "m^2 s^-2",
             "dims": [I_DIM, J_DIM],
         },
-        "qvapor": {
-            "name": "specific_humidity",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-        },
-        "qliquid": {
-            "name": "cloud_water_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-        },
-        "qice": {
-            "name": "cloud_ice_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-        },
-        "qrain": {
-            "name": "rain_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-        },
-        "qsnow": {
-            "name": "snow_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-        },
-        "qgraupel": {
-            "name": "graupel_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-        },
-        "qo3mr": {
-            "name": "ozone_mixing_ratio",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "kg/kg",
-        },
-        "qsgs_tke": {
-            "name": "turbulent_kinetic_energy",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "m**2/s**2",
-        },
-        "qcld": {
-            "name": "cloud_fraction",
-            "dims": [I_DIM, J_DIM, K_DIM],
-            "units": "",
-        },
         "omga": {
             "name": "vertical_pressure_velocity",
             "dims": [I_DIM, J_DIM, K_DIM],
@@ -207,6 +164,7 @@ class TranslateFVDynamics(ParallelTranslateBaseSlicing):
         },
         "bdt": {"dims": []},
         "ptop": {"dims": []},
+        "tracers": {"dims": [I_DIM, J_DIM, K_DIM, FVTracersAxisName], "units": "kg/g"},
     }
 
     outputs = inputs.copy()
@@ -228,15 +186,6 @@ def __init__(
             "v": grid.x3d_domain_dict(),
             "w": {},
             "delz": {},
-            "qvapor": grid.compute_dict(),
-            "qliquid": grid.compute_dict(),
-            "qice": grid.compute_dict(),
-            "qrain": grid.compute_dict(),
-            "qsnow": grid.compute_dict(),
-            "qgraupel": grid.compute_dict(),
-            "qo3mr": grid.compute_dict(),
-            "qsgs_tke": grid.compute_dict(),
-            "qcld": {},
             "ps": {},
             "pe": {
                 "istart": grid.is_ - 1,
@@ -265,10 +214,10 @@ def __init__(
             "va": {},
             "uc": grid.x3d_domain_dict(),
             "vc": grid.y3d_domain_dict(),
-            "mfxd": grid.x3d_compute_dict(),
-            "mfyd": grid.y3d_compute_dict(),
-            "cxd": grid.x3d_compute_domain_y_dict(),
-            "cyd": grid.y3d_compute_domain_x_dict(),
+            "mfxd_FV": grid.x3d_compute_dict(),
+            "mfyd_FV": grid.y3d_compute_dict(),
+            "cxd_FV": grid.x3d_compute_domain_y_dict(),
+            "cyd_FV": grid.y3d_compute_domain_x_dict(),
             "diss_estd": {},
         }
         self._base.in_vars["data_vars"].update(fv_dynamics_vars)
@@ -276,20 +225,23 @@ def __init__(
         self._base.out_vars.update(fv_dynamics_vars)
         self._base.out_vars["ps"] = {"kstart": grid.npz - 1, "kend": grid.npz - 1}
         self._base.out_vars["phis"] = {"kstart": grid.npz - 1, "kend": grid.npz - 1}
+        self._base.out_vars["tracers"] = {}
         self._base.out_vars.pop("ua")
 
         self.max_error = 1e-5
 
-        self.ignore_near_zero_errors = {}
-        for qvar in utils.tracer_variables:
-            self.ignore_near_zero_errors[qvar] = True
-        self.ignore_near_zero_errors["q_con"] = True
+        self.ignore_near_zero_errors: dict[str, float | bool] = {}
         self.dycore: fv_dynamics.DynamicalCore | None = None
         self.stencil_factory = stencil_factory
-        self.config = DynamicalCoreConfig.from_f90nml(namelist)
+        self.config: DynamicalCoreConfig = DynamicalCoreConfig.from_f90nml(namelist)
 
-    def state_from_inputs(self, inputs: dict) -> DycoreState:
+    def state_from_inputs(self, inputs: dict[str, np.ndarray]) -> DycoreState:
         input_storages = super().state_from_inputs(inputs)
+        # Move fluxes and courant numbers
+        input_storages["mfxd"] = input_storages.pop("mfxd_FV")
+        input_storages["mfyd"] = input_storages.pop("mfyd_FV")
+        input_storages["cxd"] = input_storages.pop("cxd_FV")
+        input_storages["cyd"] = input_storages.pop("cyd_FV")
         # making sure we init DycoreState with the exact set of variables
         accepted_keys = [_field.name for _field in fields(DycoreState)]
         to_delete = []
@@ -298,10 +250,11 @@ def state_from_inputs(self, inputs: dict) -> DycoreState:
                 to_delete.append(name)
         for name in to_delete:
             del input_storages[name]
-
-        return DycoreState.init_from_storages(
-            input_storages, sizer=self.grid.sizer, backend=self.stencil_factory.backend
+        state = DycoreState.init_from_storages(
+            storages=input_storages,
+            quantity_factory=self.grid.quantity_factory,
         )
+        return state
 
     def prepare_data(self, inputs: dict) -> tuple[DycoreState, GridData]:
         for name in ("ak", "bk"):
@@ -322,6 +275,9 @@ def prepare_data(self, inputs: dict) -> tuple[DycoreState, GridData]:
         return state, grid_data
 
     def compute_parallel(self, inputs: dict, communicator: Communicator) -> dict:
+        setup_fvtracers(
+            self.grid.quantity_factory, inputs["tracers"].shape[3], GEOS_tracers_mapping
+        )
         state, grid_data = self.prepare_data(inputs)
         self.dycore = fv_dynamics.DynamicalCore(
             comm=communicator,
@@ -332,7 +288,7 @@ def compute_parallel(self, inputs: dict, communicator: Communicator) -> dict:
             config=self.config,
             phis=state.phis,
             state=state,
-            timestep=timedelta(seconds=inputs["bdt"]),
+            timestep=timedelta(seconds=float(inputs["bdt"])),
         )
         self.dycore.step_dynamics(state, NullTimer())
         outputs = self.outputs_from_state(state)
@@ -345,8 +301,10 @@ def outputs_from_state(self, state: DycoreState) -> dict:
         outputs = {}
         storages = {}
         for name, _properties in self.outputs.items():
-            if isinstance(state[name], Quantity):
-                storages[name] = state[name].data
+            if name in ["mfxd_FV", "mfyd_FV", "cxd_FV", "cyd_FV"]:
+                storages[name] = state[name[:-3]]._data
+            elif isinstance(state[name], Quantity):
+                storages[name] = state[name]._data
             elif len(self.outputs[name]["dims"]) > 0:
                 storages[name] = state[name]  # assume it's a storage
             else:
diff --git a/pyfv3/tracers.py b/pyfv3/tracers.py
new file mode 100644
index 00000000..54ac105e
--- /dev/null
+++ b/pyfv3/tracers.py
@@ -0,0 +1,81 @@
+from ndsl import QuantityFactory
+from ndsl.dsl.typing import Float
+from ndsl.quantity.data_dimensions_field import DataDimensionsField, SparseNameMapping
+
+
+FVTracers = DataDimensionsField.declare()
+FVTracersAxisName = "fv_tracers"
+
+_EXPECTED_FV_TRACERS = [
+    "vapor",
+    "liquid",
+    "rain",
+    "ice",
+    "snow",
+    "graupel",
+    "cloud",
+]
+"""Expected tracers for FV dynamics to be able to run in the current state."""
+
+GEOS_tracers_mapping = {
+    "vapor": 0,
+    "liquid": 1,
+    "ice": 2,
+    "rain": 3,
+    "snow": 4,
+    "graupel": 5,
+    "cloud": 6,
+}
+"""Default mapping for liquid tracers for GEOS"""
+
+
+def setup_fvtracers(
+    quantity_factory: QuantityFactory,
+    tracer_count: int,
+    name_mapping: SparseNameMapping,
+) -> None:
+    """Setup FV Tracers and sparse mapping to call tracer by name"""
+
+    if tracer_count > 6 and not all(
+        tracer in name_mapping for tracer in _EXPECTED_FV_TRACERS
+    ):
+        raise ValueError(
+            f"FV Tracers requires name mapping for all of the follwoing {_EXPECTED_FV_TRACERS}."
+            f"Given {name_mapping}."
+        )
+
+    if FVTracersAxisName not in quantity_factory.sizer.data_dimensions:
+        quantity_factory.add_data_dimensions({FVTracersAxisName: tracer_count})
+    elif quantity_factory.sizer.data_dimensions[FVTracersAxisName] != tracer_count:
+        raise ValueError(
+            f"FV Tracers re-setup with {tracer_count} differs "
+            f"from previous registering with {quantity_factory.sizer.data_dimensions[FVTracersAxisName]}"
+        )
+
+    if not DataDimensionsField.exists("FVTracers"):
+        DataDimensionsField.register(
+            FVTracers, quantity_factory, [FVTracersAxisName], name_mapping, dtype=Float
+        )
+
+
+def default_ai2_tracers(quantity_factory: QuantityFactory) -> None:
+    """Default FV Tracers setup for the AI2 dataset & code"""
+    ai2_tracers = {
+        "vapor": 0,
+        "liquid": 1,
+        "rain": 2,
+        "ice": 3,
+        "snow": 4,
+        "graupel": 5,
+        "o3mr": 6,
+        "sgs_tke": 7,
+        "cloud": 8,
+    }
+    setup_fvtracers(quantity_factory, len(ai2_tracers.keys()), ai2_tracers)
+
+
+def default_GEOS_tracers(quantity_factory: QuantityFactory) -> None:
+    """Default FV Tracers setup for the GEOS dataset & code"""
+    setup_fvtracers(
+        quantity_factory, len(GEOS_tracers_mapping.keys()), GEOS_tracers_mapping
+    )
diff --git a/pyfv3/utils/functional_validation.py b/pyfv3/utils/functional_validation.py
index 3cd3fe29..3a279a8c 100644
--- a/pyfv3/utils/functional_validation.py
+++ b/pyfv3/utils/functional_validation.py
@@ -1,4 +1,3 @@
-import copy
 from collections.abc import Callable, Sequence
 
 import numpy as np
@@ -38,26 +37,3 @@ def subset(data: np.ndarray) -> np.ndarray:
         )
 
     return subset
-
-
-def get_set_nan_func(
-    grid_indexing: GridIndexing,
-    dims: Sequence[str],
-    n_halo: tuple[tuple[int, int], tuple[int, int]] = ((0, 0), (0, 0)),
-) -> Callable[[np.ndarray], np.ndarray]:
-    subset = get_subset_func(grid_indexing=grid_indexing, dims=dims, n_halo=n_halo)
-
-    def set_nans(data: np.ndarray) -> np.ndarray:
-        try:
-            safe = copy.deepcopy(data)
-            data[:] = np.nan
-            # data_subset is a view of data, so modifying data_subset modifies data
-            data_subset = subset(data)
-            data_subset[:] = subset(safe)
-        except TypeError:
-            safe = copy.deepcopy(data.data)
-            data.data[:] = np.nan
-            data_subset = subset(data.data)
-            data_subset[:] = subset(safe)
-
-    return set_nans
diff --git a/pyfv3/wrappers/geos_wrapper.py b/pyfv3/wrappers/geos_wrapper.py
index f452485b..16a1d401 100644
--- a/pyfv3/wrappers/geos_wrapper.py
+++ b/pyfv3/wrappers/geos_wrapper.py
@@ -35,6 +35,17 @@
 from ndsl.utils import safe_assign_array
 
 
+GEOS_TRACER_MAPPING = [
+    "vapor",
+    "liquid",
+    "ice",
+    "rain",
+    "snow",
+    "graupel",
+    "cloud",
+]
+
+
 class StencilBackendCompilerOverride:
     """Override the Pace global stencil JIT to allow for 9-rank build
     on any setup.
@@ -106,8 +117,22 @@ def __init__(
         bdt: int,
         comm: Comm,
         backend: Backend,
+        water_tracers_count: int,
+        all_tracers_count: int,
         fortran_mem_space: MemorySpace = MemorySpace.HOST,
-    ) -> None:
+    ):
+        # Check for water species configuration not handled by the interface
+        if water_tracers_count != 6:
+            raise NotImplementedError(
+                f"[pyfv3 Bridge] Bridge expect 6 water species, got {water_tracers_count}."
+            )
+
+        # Build the full tracer mapping by appending None to the expected tracer list
+        # based on parameter
+        self._tracers_mapping = GEOS_TRACER_MAPPING
+        for i in range(all_tracers_count, len(GEOS_TRACER_MAPPING)):
+            self._tracers_mapping.append(f"tracer_#{i}")
+
         # Look for an override to run on a single node
         gtfv3_single_rank_override = int(os.getenv("GTFV3_SINGLE_RANK_OVERRIDE", -1))
         if gtfv3_single_rank_override >= 0:
@@ -213,7 +238,6 @@ def __init__(
         )
 
         self.output_dict: dict[str, np.ndarray] = {}
-        self._allocate_output_dir()
 
         # Feedback information
         device_ordinal_info = (
@@ -248,6 +272,16 @@ def _critical_path(self) -> None:
                 timer=self.perf_collector.timestep_timer,
             )
 
+    def _collect_timings(self, timings: dict[str, list[float]]) -> None:
+        """Collect performance of the timestep"""
+        self.perf_collector.collect_performance()
+        for k, v in self.perf_collector.times_per_step[0].items():
+            if k not in timings.keys():
+                timings[k] = [v]
+            else:
+                timings[k].append(v)
+        self.perf_collector.clear()
+
     def __call__(
         self,
         timings: dict[str, list[float]],
@@ -310,14 +344,7 @@ def __call__(
         with self.perf_collector.timestep_timer.clock("dycore-to-numpy"):
             self.output_dict = self._prep_outputs_for_geos()
 
-        # Collect performance of the timestep and write a json file for rank 0
-        self.perf_collector.collect_performance()
-        for k, v in self.perf_collector.times_per_step[0].items():
-            if k not in timings.keys():
-                timings[k] = [v]
-            else:
-                timings[k].append(v)
-        self.perf_collector.clear()
+        self._collect_timings(timings)
 
         return self.output_dict, timings
 
@@ -383,15 +410,11 @@ def _put_fortran_data_in_dycore(
         safe_assign_array(state.omga.view[:], omga[isc:iec, jsc:jec, :])
         safe_assign_array(state.diss_estd.view[:], diss_estd[isc:iec, jsc:jec, :])
 
-        # tracer quantities should be a 4d array in order:
-        # vapor, liquid, ice, rain, snow, graupel, cloud
-        safe_assign_array(state.qvapor.view[:], q[isc:iec, jsc:jec, :, 0])
-        safe_assign_array(state.qliquid.view[:], q[isc:iec, jsc:jec, :, 1])
-        safe_assign_array(state.qice.view[:], q[isc:iec, jsc:jec, :, 2])
-        safe_assign_array(state.qrain.view[:], q[isc:iec, jsc:jec, :, 3])
-        safe_assign_array(state.qsnow.view[:], q[isc:iec, jsc:jec, :, 4])
-        safe_assign_array(state.qgraupel.view[:], q[isc:iec, jsc:jec, :, 5])
-        safe_assign_array(state.qcld.view[:], q[isc:iec, jsc:jec, :, 6])
+        # Copy tracer data
+        for index, name in enumerate(self._tracers_mapping):
+            safe_assign_array(
+                state.tracers[name].view[:], q[isc:iec, jsc:jec, :, index]
+            )
 
         return state
 
@@ -403,6 +426,7 @@ def _prep_outputs_for_geos(self) -> dict[str, np.ndarray]:
         jec = self._grid_indexing.jec + 1
 
         if self._fortran_mem_space != self._pace_mem_space:
+            self._allocate_output_dir()
             safe_assign_array(output_dict["u"], self.dycore_state.u.data[:-1, :, :-1])
             safe_assign_array(output_dict["v"], self.dycore_state.v.data[:, :-1, :-1])
             safe_assign_array(output_dict["w"], self.dycore_state.w.data[:-1, :-1, :-1])
@@ -468,27 +492,8 @@ def _prep_outputs_for_geos(self) -> dict[str, np.ndarray]:
                 self.dycore_state.diss_estd.data[:-1, :-1, :-1],
             )
 
-            safe_assign_array(
-                output_dict["qvapor"], self.dycore_state.qvapor.data[:-1, :-1, :-1]
-            )
-            safe_assign_array(
-                output_dict["qliquid"], self.dycore_state.qliquid.data[:-1, :-1, :-1]
-            )
-            safe_assign_array(
-                output_dict["qice"], self.dycore_state.qice.data[:-1, :-1, :-1]
-            )
-            safe_assign_array(
-                output_dict["qrain"], self.dycore_state.qrain.data[:-1, :-1, :-1]
-            )
-            safe_assign_array(
-                output_dict["qsnow"], self.dycore_state.qsnow.data[:-1, :-1, :-1]
-            )
-            safe_assign_array(
-                output_dict["qgraupel"], self.dycore_state.qgraupel.data[:-1, :-1, :-1]
-            )
-            safe_assign_array(
-                output_dict["qcld"], self.dycore_state.qcld.data[:-1, :-1, :-1]
-            )
+            # Copy tracer data
+            safe_assign_array(output_dict["q"], self.dycore_state.tracers.as_4D_array())
         else:
             output_dict["u"] = self.dycore_state.u.data[:-1, :, :-1]
             output_dict["v"] = self.dycore_state.v.data[:, :-1, :-1]
@@ -519,23 +524,18 @@ def _prep_outputs_for_geos(self) -> dict[str, np.ndarray]:
             output_dict["q_con"] = self.dycore_state.q_con.data[:-1, :-1, :-1]
             output_dict["omga"] = self.dycore_state.omga.data[:-1, :-1, :-1]
             output_dict["diss_estd"] = self.dycore_state.diss_estd.data[:-1, :-1, :-1]
-            output_dict["qvapor"] = self.dycore_state.qvapor.data[:-1, :-1, :-1]
-            output_dict["qliquid"] = self.dycore_state.qliquid.data[:-1, :-1, :-1]
-            output_dict["qice"] = self.dycore_state.qice.data[:-1, :-1, :-1]
-            output_dict["qrain"] = self.dycore_state.qrain.data[:-1, :-1, :-1]
-            output_dict["qsnow"] = self.dycore_state.qsnow.data[:-1, :-1, :-1]
-            output_dict["qgraupel"] = self.dycore_state.qgraupel.data[:-1, :-1, :-1]
-            output_dict["qcld"] = self.dycore_state.qcld.data[:-1, :-1, :-1]
+            output_dict["q"] = self.dycore_state.tracers.as_4D_array()
 
         return output_dict
 
     def _allocate_output_dir(self) -> None:
+        if len(self.output_dict) != 0:
+            return
         if self._fortran_mem_space != self._pace_mem_space:
             nhalo = self._grid_indexing.n_halo
             shape_centered = self._grid_indexing.domain_full(add=(0, 0, 0))
             shape_x_interface = self._grid_indexing.domain_full(add=(1, 0, 0))
             shape_y_interface = self._grid_indexing.domain_full(add=(0, 1, 0))
-            shape_z_interface = self._grid_indexing.domain_full(add=(0, 0, 1))
             shape_2d = shape_centered[:-1]
 
             self.output_dict["u"] = np.empty((shape_y_interface))
@@ -588,34 +588,3 @@ def _allocate_output_dir(self) -> None:
             self.output_dict["qsnow"] = np.empty((shape_centered))
             self.output_dict["qgraupel"] = np.empty((shape_centered))
             self.output_dict["qcld"] = np.empty((shape_centered))
-        else:
-            self.output_dict["u"] = None
-            self.output_dict["v"] = None
-            self.output_dict["w"] = None
-            self.output_dict["ua"] = None
-            self.output_dict["va"] = None
-            self.output_dict["uc"] = None
-            self.output_dict["vc"] = None
-            self.output_dict["delz"] = None
-            self.output_dict["pt"] = None
-            self.output_dict["delp"] = None
-            self.output_dict["mfxd"] = None
-            self.output_dict["mfyd"] = None
-            self.output_dict["cxd"] = None
-            self.output_dict["cyd"] = None
-            self.output_dict["ps"] = None
-            self.output_dict["pe"] = None
-            self.output_dict["pk"] = None
-            self.output_dict["peln"] = None
-            self.output_dict["pkz"] = None
-            self.output_dict["phis"] = None
-            self.output_dict["q_con"] = None
-            self.output_dict["omga"] = None
-            self.output_dict["diss_estd"] = None
-            self.output_dict["qvapor"] = None
-            self.output_dict["qliquid"] = None
-            self.output_dict["qice"] = None
-            self.output_dict["qrain"] = None
-            self.output_dict["qsnow"] = None
-            self.output_dict["qgraupel"] = None
-            self.output_dict["qcld"] = None
diff --git a/pyproject.toml b/pyproject.toml
index f93e40d6..b0bdc982 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -44,7 +44,7 @@ extras = [
   "pyfv3[ndsl]",
   "pyfv3[test]"
 ]
-ndsl = ["ndsl @ git+https://github.com/NOAA-GFDL/NDSL.git@2026.03.00"]
+ndsl = ["ndsl @ git+https://github.com/FlorianDeconinck/NDSL.git@feature/data_dimnesion_fields"]
 test = [
   "coverage",
   "pytest",
diff --git a/tests/mpi/test_doubly_periodic.py b/tests/mpi/test_doubly_periodic.py
index cac8a364..2ba9f1a2 100644
--- a/tests/mpi/test_doubly_periodic.py
+++ b/tests/mpi/test_doubly_periodic.py
@@ -19,6 +19,7 @@
 from ndsl.grid import DampingCoefficients, GridData, MetricTerms
 from ndsl.performance import NullTimer
 from pyfv3 import DynamicalCore, DynamicalCoreConfig
+from pyfv3.tracers import default_ai2_tracers
 
 
 def test_dycore_runs_one_step() -> None:
@@ -123,6 +124,8 @@ def test_dycore_runs_one_step() -> None:
         grid_indexing=grid_indexing,
     )
 
+    default_ai2_tracers(quantity_factory)
+
     dycore = DynamicalCore(
         comm=communicator,
         grid_data=grid_data,
@@ -132,6 +135,7 @@ def test_dycore_runs_one_step() -> None:
         config=config,
         phis=state.phis,
         state=state,
+        exclude_tracers=[],
         timestep=timedelta(seconds=255),
     )
 
diff --git a/tests/savepoint/translate/__init__.py b/tests/savepoint/translate/__init__.py
index b0923736..5bc7c02c 100644
--- a/tests/savepoint/translate/__init__.py
+++ b/tests/savepoint/translate/__init__.py
@@ -8,12 +8,14 @@
     TranslateDivergenceCorner,
     TranslateVorticityTransport_Cgrid,
 )
+from .translate_cond_output import TranslateCond_output
 from .translate_corners import (
     TranslateCopyCorners,
     TranslateFill4Corners,
     TranslateFillCorners,
     TranslateFillCornersVector,
 )
+from .translate_cs_profile import TranslateCS_Profile
 from .translate_cubedtolatlon import TranslateCubedToLatLon
 from .translate_d2a2c_vect import TranslateD2A2C_Vect
 from .translate_d_sw import (
@@ -29,9 +31,11 @@
 from .translate_delnflux import TranslateDelnFlux, TranslateDelnFlux_2
 from .translate_divergencedamping import TranslateDivergenceDamping
 from .translate_fillz import TranslateFillz
-from .translate_fvsubgridz import TranslateFVSubgridZ
+
+# from .translate_fvsubgridz import TranslateFVSubgridZ # <-- BROKEN CODE
 from .translate_fvtp2d import TranslateFvTp2d, TranslateFvTp2d_2
 from .translate_fxadv import TranslateFxAdv
+from .translate_getMPIprop import TranslateGetMPIProp
 from .translate_grid import (
     TranslateAGrid,
     TranslateDerivedTrig,
@@ -62,25 +66,43 @@
     TranslateJablonowskiBaroclinic,
     TranslatePVarAuxiliaryPressureVars,
 )
+from .translate_lagrangian_contribution_interp import (
+    TranslateLagrangian_Contribution_Interp,
+)
 from .translate_last_step import TranslateLastStep
+from .translate_map1_ppm_delz import TranslateMap1_PPM_delz
+from .translate_map1_ppm_W import TranslateMap1_PPM_W
+from .translate_map_scalar import TranslateMap_Scalar
+from .translate_MapN_Tracer_2d import TranslateMapN_Tracer_2d
 from .translate_moistcvpluspkz_2d import TranslateMoistCVPlusPkz_2d
 from .translate_moistcvpluspt_2d import TranslateMoistCVPlusPt_2d
+from .translate_moistcvpluspt_2d_last_step import TranslateMoistCVPlusPt_2d_last_step
+from .translate_moistcvpluste_2d import TranslateMoistCVPlusTe_2d
+from .translate_mpp_global_sum import TranslateMpp_global_sum
 from .translate_neg_adj3 import TranslateNeg_Adj3
 from .translate_nh_p_grad import TranslateNH_P_Grad
 from .translate_pe_halo import TranslatePE_Halo
+from .translate_pe_pk_delp_peln import TranslatePE_pk_delp_peln
 from .translate_pk3_halo import TranslatePK3_Halo
 from .translate_pressureadjustedtemperature_nonhydrostatic import (
     TranslatePressureAdjustedTemperature_NonHydrostatic,
 )
+from .translate_Pressures_mapU import TranslatePressures_mapU
+from .translate_Pressures_mapV import TranslatePressures_mapV
 from .translate_qsinit import TranslateQSInit
 from .translate_ray_fast import TranslateRay_Fast
 from .translate_remapping import TranslateRemapping
+from .translate_remapping_GEOS import TranslateRemapping_GEOS
 from .translate_riem_solver3 import TranslateRiem_Solver3
 from .translate_riem_solver_c import TranslateRiem_Solver_C
 from .translate_satadjust3d import TranslateSatAdjust3d
+from .translate_scalar_profile import TranslateScalar_Profile
+from .translate_te_zsum import TranslateTe_Zsum
 from .translate_tracer2d1l import TranslateTracer2D1L
+from .translate_tracer2d1l_cmax import TranslateTracerCMax
 from .translate_updatedzc import TranslateUpdateDzC
 from .translate_updatedzd import TranslateUpdateDzD
+from .translate_w_fix_consrv_moment import TranslateW_fix_consrv_moment
 from .translate_xppm import TranslateXPPM, TranslateXPPM_2
 from .translate_xtp_u import TranslateXTP_U
 from .translate_yppm import TranslateYPPM, TranslateYPPM_2
diff --git a/tests/savepoint/translate/overrides/standard.yaml b/tests/savepoint/translate/overrides/standard.yaml
index 10a39405..61070cc5 100644
--- a/tests/savepoint/translate/overrides/standard.yaml
+++ b/tests/savepoint/translate/overrides/standard.yaml
@@ -31,7 +31,7 @@ MapN_Tracer_2d:
     max_error: 9e-9 # 48_6ranks
 
 NH_P_Grad:
-  max_error: 6e-11
+  - max_error: 6e-11
 
 Riem_Solver3:
   - backend: st:gt:gpu:KJI
@@ -57,6 +57,11 @@ Remapping:
       - q_con
       - tracers
 
+Remapping_GEOS:
+  - backend: all
+    multimodal:
+      ulp_threshold: 150
+
 UpdateDzC:
   - backend: st:gt:gpu:KJI
     max_error: 5e-10
@@ -112,6 +117,9 @@ Tracer2D1L:
   - backend: st:gt:cpu:KJI
     ignore_near_zero_errors:
       tracers: 1e-15
+  - backend: all
+    multimodal:
+      ulp_threshold: 200
 
 DivgDel6:
   - max_error: 3e-13 # 48_6ranks
@@ -155,7 +163,9 @@ UtilVectors:
   - max_error: 2e-10 # 48_6ranks
 
 FVDynamics:
-  - max_error: 5e-5 # 48_6ranks using metric terms
+  - backend: all
+    multimodal:
+      ulp_threshold: 100
 
 DivergenceDamping:
   - backend: st:dace:cpu:KIJ
diff --git a/tests/savepoint/translate/translate_MapN_Tracer_2d.py b/tests/savepoint/translate/translate_MapN_Tracer_2d.py
new file mode 100644
index 00000000..09387913
--- /dev/null
+++ b/tests/savepoint/translate/translate_MapN_Tracer_2d.py
@@ -0,0 +1,81 @@
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.constants import I_DIM, J_DIM, K_DIM
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.mapn_tracer import MapNTracer
+from pyfv3.tracers import FVTracersAxisName, GEOS_tracers_mapping, setup_fvtracers
+
+
+class TranslateMapN_Tracer_2d(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "qtracers": {},
+            "pe1": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+            "pe2": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+            "dp2": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+        }
+
+        self.out_vars = {
+            "qtracers": {},
+        }
+
+        # Value from GEOS
+        self.kord = 9
+
+        # mode / iv set to 1 from GEOS
+        self.mode = 1
+
+        self.nq = 9
+
+        self.fill = True
+
+        self._tracers = None
+
+    def compute_from_storage(self, inputs):
+        setup_fvtracers(
+            self.quantity_factory, inputs["qtracers"].shape[3], GEOS_tracers_mapping
+        )
+        self._tracers = self.quantity_factory.from_array(
+            inputs["qtracers"], [I_DIM, J_DIM, K_DIM, FVTracersAxisName], ""
+        )
+
+        self._compute_func = MapNTracer(
+            self.stencil_factory,
+            self.quantity_factory,
+            abs(self.kord),
+            fill=self.fill,
+        )
+
+        self._compute_func(
+            inputs["pe1"],
+            inputs["pe2"],
+            inputs["dp2"],
+            self._tracers,
+        )
+
+        return inputs
diff --git a/tests/savepoint/translate/translate_Pressures_mapU.py b/tests/savepoint/translate/translate_Pressures_mapU.py
new file mode 100644
index 00000000..bfa8d93f
--- /dev/null
+++ b/tests/savepoint/translate/translate_Pressures_mapU.py
@@ -0,0 +1,162 @@
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.constants import I_DIM, J_DIM, J_INTERFACE_DIM, K_DIM
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.map_single import MapSingle
+from pyfv3.stencils.remapping import pe0_ptop_xmax, pressures_mapu
+
+
+class TranslatePressures_mapU(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "pe_": {
+                "istart": grid.is_ - 1,
+                "iend": grid.ie + 1,
+                "jstart": grid.js - 1,
+                "jend": grid.je + 1,
+                "kend": grid.npz,
+            },
+            "ak": {},
+            "bk": {},
+            "pe0_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz,
+            },
+            "pe3_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz,
+            },
+            "u_": {
+                "istart": grid.isd,
+                "iend": grid.ied,
+                "jstart": grid.jsd,
+                "jend": grid.jed + 1,
+                "kend": grid.npz - 1,
+            },
+            "mfy_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz - 1,
+            },
+            "cy_": {
+                "istart": grid.isd,
+                "iend": grid.ied,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz - 1,
+            },
+        }
+        self.in_vars["parameters"] = [
+            "ptop",
+            "kord_mt",
+        ]
+
+        self.out_vars = {
+            "pe0_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz,
+            },
+            "pe3_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz,
+            },
+            "u_": {
+                "istart": grid.isd,
+                "iend": grid.ied,
+                "jstart": grid.jsd,
+                "jend": grid.jed + 1,
+                "kend": grid.npz - 1,
+            },
+            "mfy_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz - 1,
+            },
+            "cy_": {
+                "istart": grid.isd,
+                "iend": grid.ied,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz - 1,
+            },
+        }
+
+        grid_indexing = stencil_factory.grid_indexing
+
+        self.dims = [I_DIM, J_DIM, K_DIM]
+
+        self._pressures_mapu = stencil_factory.from_origin_domain(
+            pressures_mapu,
+            origin=grid_indexing.origin_compute(),
+            domain=(grid_indexing.domain[0], 1, grid_indexing.domain[2] + 1),
+        )
+
+        self._pe0_ptop_xmax = stencil_factory.from_origin_domain(
+            pe0_ptop_xmax,
+            origin=(grid_indexing.domain[0] + 3, 3, 0),
+            domain=(1, 1, grid_indexing.domain[2] + 1),
+        )
+
+    def compute_from_storage(self, inputs):
+        self._map1_ppm_u = MapSingle(
+            self.stencil_factory,
+            self.quantity_factory,
+            inputs["kord_mt"],
+            -1,
+            dims=[I_DIM, J_INTERFACE_DIM, K_DIM],
+        )
+
+        self._pressures_mapu(
+            inputs["pe_"],
+            inputs["ak"],
+            inputs["bk"],
+            inputs["pe0_"],
+            inputs["pe3_"],
+            inputs["ptop"],
+        )
+
+        self._pe0_ptop_xmax(
+            inputs["pe0_"],
+            inputs["ptop"],
+        )
+
+        self._map1_ppm_u(
+            inputs["u_"],
+            inputs["pe0_"],
+            inputs["pe3_"],
+        )
+        self._map1_ppm_u(
+            inputs["mfy_"],
+            inputs["pe0_"],
+            inputs["pe3_"],
+        )
+
+        self._map1_ppm_u(
+            inputs["cy_"],
+            inputs["pe0_"],
+            inputs["pe3_"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_Pressures_mapV.py b/tests/savepoint/translate/translate_Pressures_mapV.py
new file mode 100644
index 00000000..1e473d06
--- /dev/null
+++ b/tests/savepoint/translate/translate_Pressures_mapV.py
@@ -0,0 +1,154 @@
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.constants import I_DIM, I_INTERFACE_DIM, J_DIM, K_DIM
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.map_single import MapSingle
+from pyfv3.stencils.remapping import pressures_mapv
+
+
+class TranslatePressures_mapV(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "pe_": {
+                "istart": grid.is_ - 1,
+                "iend": grid.ie + 1,
+                "jstart": grid.js - 1,
+                "jend": grid.je + 1,
+                "kend": grid.npz + 1,
+            },
+            "pe0_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz + 1,
+            },
+            "pe3_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz + 1,
+            },
+            "ak": {},
+            "bk": {},
+            "v_": {
+                "istart": grid.isd,
+                "iend": grid.ied + 1,
+                "jstart": grid.jsd,
+                "jend": grid.jed,
+                "kend": grid.npz - 1,
+            },
+            "mfx_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "cx_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.jsd,
+                "jend": grid.jed,
+                "kend": grid.npz - 1,
+            },
+        }
+        self.in_vars["parameters"] = [
+            "kord_mt",
+        ]
+
+        self.out_vars = {
+            "pe0_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz + 1,
+            },
+            "pe3_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz + 1,
+            },
+            "v_": {
+                "istart": grid.isd,
+                "iend": grid.ied + 1,
+                "jstart": grid.jsd,
+                "jend": grid.jed,
+                "kend": grid.npz - 1,
+            },
+            "mfx_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "cx_": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.jsd,
+                "jend": grid.jed,
+                "kend": grid.npz - 1,
+            },
+        }
+
+        grid_indexing = stencil_factory.grid_indexing
+
+        self.dims = [I_DIM, J_DIM, K_DIM]
+
+        self._pressures_mapv = stencil_factory.from_origin_domain(
+            pressures_mapv,
+            origin=grid_indexing.origin_compute(),
+            domain=(
+                grid_indexing.domain[0] + 1,
+                1,
+                grid_indexing.domain[2] + 1,
+            ),
+        )
+
+    def compute_from_storage(self, inputs):
+        self._map1_ppm_v = MapSingle(
+            self.stencil_factory,
+            self.quantity_factory,
+            inputs["kord_mt"],
+            -1,
+            dims=[I_INTERFACE_DIM, J_DIM, K_DIM],
+        )
+
+        self._pressures_mapv(
+            inputs["pe_"],
+            inputs["ak"],
+            inputs["bk"],
+            inputs["pe0_"],
+            inputs["pe3_"],
+        )
+
+        self._map1_ppm_v(
+            inputs["v_"],
+            inputs["pe0_"],
+            inputs["pe3_"],
+        )
+
+        self._map1_ppm_v(
+            inputs["mfx_"],
+            inputs["pe0_"],
+            inputs["pe3_"],
+        )
+
+        self._map1_ppm_v(
+            inputs["cx_"],
+            inputs["pe0_"],
+            inputs["pe3_"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_a2b_ord4.py b/tests/savepoint/translate/translate_a2b_ord4.py
index 9ed0187f..e4b03bd3 100644
--- a/tests/savepoint/translate/translate_a2b_ord4.py
+++ b/tests/savepoint/translate/translate_a2b_ord4.py
@@ -1,11 +1,13 @@
 from typing import Any, Dict
 
+import numpy as np
 from f90nml import Namelist
 
 from ndsl import StencilFactory, orchestrate
-from ndsl.constants import K_DIM
+from ndsl.constants import I_DIM, J_DIM, K_DIM
 from pyfv3.stencils import DivergenceDamping
 from pyfv3.testing import TranslateDycoreFortranData2Py
+from pyfv3.utils.functional_validation import get_subset_func
 
 
 class A2B_Ord4Compute:
@@ -59,6 +61,11 @@ def __init__(
         self.out_vars: Dict[str, Any] = {"wk": {}, "vort": {}}
         self.stencil_factory = stencil_factory
         self.compute_obj = A2B_Ord4Compute(stencil_factory)
+        self._subset = get_subset_func(
+            self.grid.grid_indexing,
+            dims=[I_DIM, J_DIM, K_DIM],
+            n_halo=((3, 3), (3, 3)),
+        )
 
     def compute_from_storage(self, inputs):
         nord_col = self.grid.quantity_factory.zeros(dims=[K_DIM], units="unknown")
@@ -81,3 +88,13 @@ def compute_from_storage(self, inputs):
         inputs["grid_type"] = 0
         self.compute_obj(divdamp, **inputs)
         return inputs
+
+    def subset_output(self, varname: str, output: np.ndarray) -> np.ndarray:
+        """
+        Given an output array, return the slice of the array which we'd
+        like to validate against reference data
+        """
+        if varname in ["wk"]:
+            return self._subset(output)
+        else:
+            return output
diff --git a/tests/savepoint/translate/translate_cond_output.py b/tests/savepoint/translate/translate_cond_output.py
new file mode 100644
index 00000000..0fecf45c
--- /dev/null
+++ b/tests/savepoint/translate/translate_cond_output.py
@@ -0,0 +1,52 @@
+from ndsl.stencils.testing import TranslateFortranData2Py
+from pyfv3.stencils import moist_cv
+
+
+class TranslateCond_output(TranslateFortranData2Py):
+    def __init__(self, grid, namelist, stencil_factory):
+        super().__init__(grid, namelist, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.in_vars["data_vars"] = {
+            "qliquid": {
+                "kend": grid.npz - 1,
+            },
+            "qice": {
+                "kend": grid.npz - 1,
+            },
+            "qrain": {
+                "kend": grid.npz - 1,
+            },
+            "qsnow": {
+                "kend": grid.npz - 1,
+            },
+            "qgraupel": {
+                "kend": grid.npz - 1,
+            },
+            "q_con": {
+                "kend": grid.npz - 1,
+            },
+        }
+
+        self.out_vars = {
+            "q_con": {
+                "kend": grid.npz - 1,
+            }
+        }
+
+        self.compute_func = stencil_factory.from_origin_domain(
+            moist_cv.cond_output,
+            origin=grid.compute_origin(),
+            domain=(grid.nic, grid.njc, grid.npz),
+        )
+
+    def compute_from_storage(self, inputs):
+
+        self.compute_func(
+            inputs["q_con"],
+            inputs["qliquid"],
+            inputs["qrain"],
+            inputs["qsnow"],
+            inputs["qice"],
+            inputs["qgraupel"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_cs_profile.py b/tests/savepoint/translate/translate_cs_profile.py
new file mode 100644
index 00000000..0e8d3487
--- /dev/null
+++ b/tests/savepoint/translate/translate_cs_profile.py
@@ -0,0 +1,114 @@
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.constants import I_DIM, J_DIM, K_DIM
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.remap_profile import RemapProfile
+
+
+class TranslateCS_Profile(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "qs_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_1": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_2": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_3": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_4": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "dp1_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+        }
+        self.in_vars["parameters"] = [
+            "iv_",
+            "kord_",
+        ]
+
+        self.out_vars = {
+            "q4_1": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_2": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_3": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_4": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+        }
+
+    def compute_from_storage(self, inputs):
+        self._compute_func = RemapProfile(
+            self.stencil_factory,
+            self.quantity_factory,
+            inputs["kord_"],
+            inputs["iv_"],
+            dims=[I_DIM, J_DIM, K_DIM],
+        )
+
+        self._compute_func(
+            inputs["qs_"],
+            inputs["q4_1"],
+            inputs["q4_2"],
+            inputs["q4_3"],
+            inputs["q4_4"],
+            inputs["dp1_"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_d_sw.py b/tests/savepoint/translate/translate_d_sw.py
index 3b1831ad..b4cbc671 100644
--- a/tests/savepoint/translate/translate_d_sw.py
+++ b/tests/savepoint/translate/translate_d_sw.py
@@ -44,10 +44,11 @@ def __init__(
             "crx": grid.x3d_compute_domain_y_dict(),
             "yfx": grid.y3d_compute_domain_x_dict(),
             "cry": grid.y3d_compute_domain_x_dict(),
-            "mfx": grid.x3d_compute_dict(),
-            "mfy": grid.y3d_compute_dict(),
-            "cx": grid.x3d_compute_domain_y_dict(),
-            "cy": grid.y3d_compute_domain_x_dict(),
+            "mfx": grid.x3d_compute_dict() | {"serialname": "mfxd_R8"},
+            "mfy": grid.y3d_compute_dict() | {"serialname": "mfyd_R8"},
+            "cx": grid.x3d_compute_domain_y_dict() | {"serialname": "cxd_R8"},
+            "cy": grid.y3d_compute_domain_x_dict() | {"serialname": "cyd_R8"},
+            "dpx": grid.compute_dict(),
             "heat_source": {},
             "diss_est": {},
             "q_con": {},
@@ -58,7 +59,8 @@ def __init__(
             "divgd": grid.default_dict_buffer_2d(),
         }
         for name, info in self.in_vars["data_vars"].items():
-            info["serialname"] = name + "d"
+            if name not in ["mfx", "mfy", "cx", "cy", "dpx"]:
+                info["serialname"] = name + "d"
         self.in_vars["parameters"] = ["dt"]
         self.out_vars = self.in_vars["data_vars"].copy()
         del self.out_vars["zh"]
@@ -99,13 +101,13 @@ def ubke(
     rsina: FloatFieldIJ,
     ut: FloatField,
     ub: FloatField,
-    dt4: float,
-    dt5: float,
+    dt4: Float,
+    dt5: Float,
 ):
     with computation(PARALLEL), interval(...):
-        dt = 2.0 * dt5
-        ub, _ = d_sw.interpolate_uc_vc_to_cell_corners(uc, vc, cosa, rsina, ut, ut)
-        ub = ub * dt
+        ub, _ = d_sw.interpolate_uc_vc_to_cell_corners(
+            uc, vc, cosa, rsina, ut, ut, dt4, dt5
+        )
 
 
 class TranslateUbKE(TranslateDycoreFortranData2Py):
@@ -146,13 +148,13 @@ def vbke(
     rsina: FloatFieldIJ,
     vt: FloatField,
     vb: FloatField,
-    dt4: float,
-    dt5: float,
+    dt4: Float,
+    dt5: Float,
 ):
     with computation(PARALLEL), interval(...):
-        dt = 2.0 * dt5
-        _, vb = d_sw.interpolate_uc_vc_to_cell_corners(uc, vc, cosa, rsina, vt, vt)
-        vb = vb * dt
+        _, vb = d_sw.interpolate_uc_vc_to_cell_corners(
+            uc, vc, cosa, rsina, vt, vt, dt4, dt5
+        )
 
 
 class TranslateVbKE(TranslateDycoreFortranData2Py):
@@ -240,17 +242,23 @@ def __init__(
 
     def compute_from_storage(self, inputs):
         column_namelist = d_sw.get_column_namelist(
-            config=self.config, quantity_factory=self.grid.quantity_factory
+            config=self.config.d_grid_shallow_water,
+            quantity_factory=self.grid.quantity_factory,
         )
         # TODO add these to the serialized data or remove the test
         inputs["damp_w"] = column_namelist["damp_w"]
         inputs["ke_bg"] = column_namelist["ke_bg"]
-        inputs["dt"] = self.config.dt_atmos / self.config.k_split / self.config.n_split
+        inputs["dt"] = Float(
+            self.config.dt_atmos / self.config.k_split / self.config.n_split
+        )
         inputs["rarea"] = self.grid.rarea
         heat_diss_stencil = self.stencil_factory.from_origin_domain(
             d_sw.heat_diss,
             origin=self.grid.compute_origin(),
             domain=self.grid.domain_shape_compute(),
+            externals={
+                "do_stochastic_ke_backscatter": self.config.do_skeb,
+            },
         )
         heat_diss_stencil(**inputs)
         return inputs
diff --git a/tests/savepoint/translate/translate_fillz.py b/tests/savepoint/translate/translate_fillz.py
index baed617f..e11b49a4 100644
--- a/tests/savepoint/translate/translate_fillz.py
+++ b/tests/savepoint/translate/translate_fillz.py
@@ -1,12 +1,12 @@
 import numpy as np
 from f90nml import Namelist
 
-import ndsl.dsl.gt4py_utils as utils
 from ndsl import StencilFactory
+from ndsl.constants import I_DIM, J_DIM, K_DIM
 from ndsl.stencils.testing import pad_field_in_j
-from ndsl.utils import safe_assign_array
 from pyfv3.stencils import fillz
 from pyfv3.testing import TranslateDycoreFortranData2Py
+from pyfv3.tracers import FVTracersAxisName, default_GEOS_tracers
 
 
 class TranslateFillz(TranslateDycoreFortranData2Py):
@@ -34,18 +34,25 @@ def __init__(
         self.max_error = 1e-13
         self.ignore_near_zero_errors = {"q2tracers": True}
         self.stencil_factory = stencil_factory
+        self.quantity_factory = grid.quantity_factory
 
-    def make_storage_data_input_vars(self, inputs, storage_vars=None):
+    def make_storage_data_input_vars(
+        self,
+        inputs,
+        storage_vars=None,
+    ) -> None:
+        default_GEOS_tracers(self.quantity_factory)
         if storage_vars is None:
             storage_vars = self.storage_vars()
         info = storage_vars["dp2"]
         inputs["dp2"] = self.make_storage_data(
             np.squeeze(inputs["dp2"]), istart=info["istart"], axis=info["axis"]
         )
+
         inputs["tracers"] = {}
         info = storage_vars["q2tracers"]
         for i in range(int(inputs["nq"])):
-            inputs["tracers"][utils.tracer_variables[i]] = self.make_storage_data(
+            inputs["tracers"][i] = self.make_storage_data(
                 np.squeeze(inputs["q2tracers"][:, :, i]),
                 istart=info["istart"],
                 axis=info["axis"],
@@ -54,35 +61,36 @@ def make_storage_data_input_vars(self, inputs, storage_vars=None):
 
     def compute(self, inputs):
         self.make_storage_data_input_vars(inputs)
-        for name, value in tuple(inputs.items()):
-            if hasattr(value, "shape") and len(value.shape) > 1 and value.shape[1] == 1:
-                inputs[name] = self.make_storage_data(
-                    pad_field_in_j(
-                        value, self.grid.njd, backend=self.stencil_factory.backend
-                    )
-                )
-        for name, value in tuple(inputs["tracers"].items()):
+        quantity_tracers = self.grid.quantity_factory.empty(
+            [I_DIM, J_DIM, K_DIM, FVTracersAxisName], "n/a"
+        )
+        for i_tracer, value in tuple(inputs["tracers"].items()):
             if hasattr(value, "shape") and len(value.shape) > 1 and value.shape[1] == 1:
-                inputs["tracers"][name] = self.make_storage_data(
+                quantity_tracers.data[:, :, :, i_tracer] = self.make_storage_data(
                     pad_field_in_j(
                         value, self.grid.njd, backend=self.stencil_factory.backend
                     )
                 )
+        inputs["tracers"] = quantity_tracers
+
         run_fillz = fillz.FillNegativeTracerValues(
             self.stencil_factory,
             self.grid.quantity_factory,
             inputs.pop("nq"),
-            inputs["tracers"],
         )
         run_fillz(**inputs)
+
         ds = self.grid.default_domain_dict()
         ds.update(self.out_vars["q2tracers"])
-        tracers = np.zeros((self.grid.nic, self.grid.npz, len(inputs["tracers"])))
-        for varname, data in inputs["tracers"].items():
-            index = utils.tracer_variables.index(varname)
-            data[self.grid.slice_dict(ds)]
-            safe_assign_array(
-                tracers[:, :, index], np.squeeze(data[self.grid.slice_dict(ds)])
-            )
-        out = {"q2tracers": tracers}
+
+        if self.stencil_factory.backend.is_fortran_aligned():
+            offset = None
+        else:
+            offset = -1
+
+        out = {
+            "q2tracers": quantity_tracers.data[
+                ds["istart"] : ds["iend"] + 1, ds["jstart"], : ds["kend"] + 1, :offset
+            ]
+        }
         return out
diff --git a/tests/savepoint/translate/translate_fvsubgridz.py b/tests/savepoint/translate/translate_fvsubgridz.py
index b66594dc..7ce76084 100644
--- a/tests/savepoint/translate/translate_fvsubgridz.py
+++ b/tests/savepoint/translate/translate_fvsubgridz.py
@@ -2,7 +2,6 @@
 
 from f90nml import Namelist
 
-import ndsl.dsl.gt4py_utils as utils
 from ndsl import StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_DIM, K_INTERFACE_DIM
 from ndsl.stencils.testing import ParallelTranslateBaseSlicing
@@ -176,8 +175,6 @@ def __init__(
             self._base.out_vars.pop(var)
 
         self.ignore_near_zero_errors = {}
-        for qvar in utils.tracer_variables:
-            self.ignore_near_zero_errors[qvar] = True
         self.stencil_factory = stencil_factory
         self.config = DynamicalCoreConfig.from_f90nml(namelist)
 
diff --git a/tests/savepoint/translate/translate_getMPIprop.py b/tests/savepoint/translate/translate_getMPIprop.py
new file mode 100644
index 00000000..d0b4848d
--- /dev/null
+++ b/tests/savepoint/translate/translate_getMPIprop.py
@@ -0,0 +1,64 @@
+import numpy as np
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.quantity import Quantity
+from ndsl.stencils.testing import ParallelTranslate
+from ndsl.stencils.testing.grid import Grid
+from ndsl.typing import Communicator
+
+
+class TranslateGetMPIProp(ParallelTranslate):
+    def __init__(
+        self,
+        grid: Grid,
+        namelist: Namelist,
+        stencil_factory: StencilFactory,
+    ):
+        print("Base TranslateGetMPIProp is initialized")
+        super().__init__(grid, namelist, stencil_factory)
+        self._base.in_vars["data_vars"] = {"delz": {}}
+        self._base.out_vars = {"delz": {}}
+
+        len_k = 10
+
+        a = [1, 2, 3, 4, 5]
+
+        self._testQuantity_1D = Quantity(
+            data=np.array(a, dtype=np.float32),
+            dims=["K"],
+            units="dunno",
+            gt4py_backend=stencil_factory.backend,
+        )
+
+        self._testQuantity_2D = Quantity(
+            data=np.ones([5, 5], dtype=np.float32),
+            dims=["I", "J"],
+            units="dunno2",
+            gt4py_backend=stencil_factory.backend,
+        )
+
+        self._testQuantity_3D = Quantity(
+            data=np.ones([3, 3, 3], dtype=np.float32),
+            dims=["I", "J", "K"],
+            units="dunno3",
+            gt4py_backend=stencil_factory.backend,
+        )
+
+    def compute_parallel(self, inputs, communicator: Communicator):
+        print("Communicator rank = ", communicator.rank)
+        print("Communicator size = ", communicator.size)
+        print("self._testQuantity = ", self._testQuantity_1D.data)
+        global_sum_q = communicator.all_reduce_sum(self._testQuantity_1D)
+        print("global_sum_q.data = ", global_sum_q.data)
+        print("global_sum_q.metadata = ", global_sum_q.metadata)
+
+        global_sum_q = communicator.all_reduce_sum(self._testQuantity_2D)
+        print("global_sum_q.data = ", global_sum_q.data)
+        print("global_sum_q.metadata = ", global_sum_q.metadata)
+
+        global_sum_q = communicator.all_reduce_sum(self._testQuantity_3D)
+        print("global_sum_q.data = ", global_sum_q.data)
+        print("global_sum_q.metadata = ", global_sum_q.metadata)
+
+        return inputs
diff --git a/tests/savepoint/translate/translate_lagrangian_contribution_interp.py b/tests/savepoint/translate/translate_lagrangian_contribution_interp.py
new file mode 100644
index 00000000..046d9766
--- /dev/null
+++ b/tests/savepoint/translate/translate_lagrangian_contribution_interp.py
@@ -0,0 +1,170 @@
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.constants import I_DIM, J_DIM, K_DIM
+from ndsl.dsl.typing import Bool, BoolFieldIJ, FloatField, Int, IntField, IntFieldIJ
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.map_single import lagrangian_contributions_interp
+
+
+class test_Lagragian_Contribution_Interp:
+    def __init__(
+        self,
+        stencil_factory: StencilFactory,
+        grid: Grid,
+    ):
+        print("In test_Lagragian_Contribution_interp")
+
+        grid_indexing = stencil_factory.grid_indexing
+
+        self._lagrangian_contributions_interp = stencil_factory.from_origin_domain(
+            func=lagrangian_contributions_interp,
+            origin=grid_indexing.origin_compute(),
+            domain=(grid.nic, 1, grid.npz),
+        )
+
+    def __call__(
+        self,
+        km: int,
+        not_exit_loop: BoolFieldIJ,
+        INDEX_LM1: IntField,
+        INDEX_LP0: IntField,
+        q: FloatField,
+        pe1: FloatField,
+        pe2: FloatField,
+        q4_1: FloatField,
+        q4_2: FloatField,
+        q4_3: FloatField,
+        q4_4: FloatField,
+        dp1: FloatField,
+        lev: IntFieldIJ,
+    ):
+        self._lagrangian_contributions_interp(
+            km,
+            not_exit_loop,
+            INDEX_LM1,
+            INDEX_LP0,
+            q,
+            pe1,
+            pe2,
+            q4_1,
+            q4_2,
+            q4_3,
+            q4_4,
+            dp1,
+            lev,
+        )
+
+
+class TranslateLagrangian_Contribution_Interp(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.compute_func = test_Lagragian_Contribution_Interp(self.stencil_factory, self.grid)  # type: ignore
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "q1": {
+                "kend": grid.npz - 1,
+            },
+            "pe1_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+            "pe2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+            "q4_1": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_2": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_3": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_4": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "dp1_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+        }
+
+        self.out_vars = {
+            "q1": {
+                "kend": grid.npz - 1,
+            },
+        }
+
+    def compute_from_storage(self, inputs):
+        self._not_exit_loop = self.quantity_factory.zeros(
+            [I_DIM, J_DIM],
+            units="",
+            dtype=Bool,
+        )
+
+        self._INDEX_LM1 = self.quantity_factory.zeros(
+            [I_DIM, J_DIM, K_DIM],
+            units="",
+            dtype=Int,
+        )
+
+        self._INDEX_LP0 = self.quantity_factory.zeros(
+            [I_DIM, J_DIM, K_DIM],
+            units="",
+            dtype=Int,
+        )
+
+        self._lev = self.quantity_factory.zeros(
+            [I_DIM, J_DIM],
+            units="",
+            dtype=Int,
+        )
+
+        self.compute_func(
+            self.grid.npz,
+            self._not_exit_loop,
+            self._INDEX_LM1,
+            self._INDEX_LP0,
+            inputs["q1"],
+            inputs["pe1_"],
+            inputs["pe2_"],
+            inputs["q4_1"],
+            inputs["q4_2"],
+            inputs["q4_3"],
+            inputs["q4_4"],
+            inputs["dp1_"],
+            self._lev,
+        )
+
+        return inputs
diff --git a/tests/savepoint/translate/translate_map1_ppm_W.py b/tests/savepoint/translate/translate_map1_ppm_W.py
new file mode 100644
index 00000000..5c3ee95b
--- /dev/null
+++ b/tests/savepoint/translate/translate_map1_ppm_W.py
@@ -0,0 +1,72 @@
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.constants import I_DIM, J_DIM, K_DIM
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.map_single import MapSingle
+
+
+class TranslateMap1_PPM_W(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "w_": {
+                "kend": grid.npz - 1,
+            },
+            "pe1_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+            "pe2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+            "ws_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+        }
+        self.in_vars["parameters"] = [
+            "kord_wz",
+        ]
+
+        self.out_vars = {
+            "w_": {
+                "kend": grid.npz - 1,
+            },
+        }
+
+        # mode / iv set to -2 from GEOS
+        self.mode = -2
+
+        self.dims = [I_DIM, J_DIM, K_DIM]
+
+    def compute_from_storage(self, inputs):
+        self._compute_func = MapSingle(
+            self.stencil_factory,
+            self.quantity_factory,
+            inputs["kord_wz"],
+            self.mode,
+            dims=[I_DIM, J_DIM, K_DIM],
+        )
+
+        self._compute_func(
+            inputs["w_"],
+            inputs["pe1_"],
+            inputs["pe2_"],
+            qs=inputs["ws_"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_map1_ppm_delz.py b/tests/savepoint/translate/translate_map1_ppm_delz.py
new file mode 100644
index 00000000..f0afac1f
--- /dev/null
+++ b/tests/savepoint/translate/translate_map1_ppm_delz.py
@@ -0,0 +1,120 @@
+from f90nml import Namelist
+from gt4py.cartesian.gtscript import PARALLEL, computation, interval
+
+from ndsl import StencilFactory
+from ndsl.constants import I_DIM, J_DIM, K_DIM
+from ndsl.dsl.typing import FloatField
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.map_single import MapSingle
+
+
+def rescale_delz_1(
+    delz: FloatField,
+    delp: FloatField,
+):
+    with computation(PARALLEL), interval(...):
+        delz = -delz / delp
+
+
+def rescale_delz_2(
+    delz: FloatField,
+    dp: FloatField,
+):
+    with computation(PARALLEL), interval(...):
+        delz = -delz * dp
+
+
+class TranslateMap1_PPM_delz(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "delz_": {
+                "kend": grid.npz - 1,
+            },
+            "pe1_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+            "pe2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+            "dp2_3d": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+            "gz_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+            "delp": {},
+        }
+        self.in_vars["parameters"] = [
+            "kord_wz",
+        ]
+
+        self.out_vars = {
+            "delz_": {
+                "kend": grid.npz - 1,
+            },
+        }
+
+        # mode / iv set to 1 from GEOS
+        self.mode = 1
+
+        self.dims = [I_DIM, J_DIM, K_DIM]
+
+        self._rescale_delz_1 = stencil_factory.from_origin_domain(
+            rescale_delz_1,
+            origin=grid.compute_origin(),
+            domain=(grid.nic, 1, grid.npz),
+        )
+
+        self._rescale_delz_2 = stencil_factory.from_origin_domain(
+            rescale_delz_2,
+            origin=grid.compute_origin(),
+            domain=(grid.nic, 1, grid.npz),
+        )
+
+    def compute_from_storage(self, inputs):
+        self._compute_func = MapSingle(
+            self.stencil_factory,
+            self.quantity_factory,
+            inputs["kord_wz"],
+            self.mode,
+            dims=[I_DIM, J_DIM, K_DIM],
+        )
+
+        self._rescale_delz_1(
+            inputs["delz_"],
+            inputs["delp"],
+        )
+
+        self._compute_func(
+            inputs["delz_"],
+            inputs["pe1_"],
+            inputs["pe2_"],
+            qs=inputs["gz_"],
+        )
+
+        self._rescale_delz_2(
+            inputs["delz_"],
+            inputs["dp2_3d"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_map_scalar.py b/tests/savepoint/translate/translate_map_scalar.py
new file mode 100644
index 00000000..e8fdcda3
--- /dev/null
+++ b/tests/savepoint/translate/translate_map_scalar.py
@@ -0,0 +1,70 @@
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.constants import I_DIM, J_DIM, K_DIM
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.map_single import MapSingle
+
+
+class TranslateMap_Scalar(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "q1": {
+                "kend": grid.npz - 1,
+            },
+            "pe1_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+            "pe2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+        }
+        self.in_vars["parameters"] = [
+            "q_min",
+        ]
+
+        self.out_vars = {
+            "q1": {
+                "kend": grid.npz - 1,
+            },
+        }
+
+        # Value from GEOS
+        self._kord_tm = 9
+
+        # mode / iv set to 1 from GEOS
+        self.mode = 1
+
+        self.dims = [I_DIM, J_DIM, K_DIM]
+
+        self._compute_func = MapSingle(
+            self.stencil_factory,
+            self.quantity_factory,
+            self._kord_tm,
+            self.mode,
+            dims=[I_DIM, J_DIM, K_DIM],
+            interpolate_contribution=True,
+        )
+
+    def compute_from_storage(self, inputs):
+        self._compute_func(
+            inputs["q1"],
+            inputs["pe1_"],
+            inputs["pe2_"],
+            qmin=inputs["q_min"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_moistcvpluspkz_2d.py b/tests/savepoint/translate/translate_moistcvpluspkz_2d.py
index 7f06af27..3cd6a6eb 100644
--- a/tests/savepoint/translate/translate_moistcvpluspkz_2d.py
+++ b/tests/savepoint/translate/translate_moistcvpluspkz_2d.py
@@ -31,9 +31,6 @@ def __call__(
         qsnow: FloatField,
         qice: FloatField,
         qgraupel: FloatField,
-        q_con: FloatField,
-        gz: FloatField,
-        cvm: FloatField,
         pkz: FloatField,
         pt: FloatField,
         cappa: FloatField,
@@ -48,9 +45,6 @@ def __call__(
             qsnow,
             qice,
             qgraupel,
-            q_con,
-            gz,
-            cvm,
             pkz,
             pt,
             cappa,
@@ -78,39 +72,18 @@ def __init__(
             "qrain": {"serialname": "qrain_js"},
             "qsnow": {"serialname": "qsnow_js"},
             "qgraupel": {"serialname": "qgraupel_js"},
-            "gz": {"serialname": "gz1d", "kstart": grid.is_, "axis": 0},
-            "cvm": {"kstart": grid.is_, "axis": 0},
             "delp": {},
             "delz": {},
-            "q_con": {},
             "pkz": {"istart": grid.is_, "jstart": grid.js},
             "pt": {},
             "cappa": {},
         }
-        self.write_vars = ["gz", "cvm"]
         for k, v in self.in_vars["data_vars"].items():
             if k not in self.write_vars:
                 v["axis"] = 1
 
         self.in_vars["parameters"] = ["r_vir"]
         self.out_vars = {
-            "gz": {
-                "serialname": "gz1d",
-                "istart": grid.is_,
-                "iend": grid.ie,
-                "jstart": grid.js,
-                "jend": grid.js,
-                "kstart": grid.npz - 1,
-                "kend": grid.npz - 1,
-            },
-            "cvm": {
-                "istart": grid.is_,
-                "iend": grid.ie,
-                "jstart": grid.js,
-                "jend": grid.js,
-                "kstart": grid.npz - 1,
-                "kend": grid.npz - 1,
-            },
             "pkz": {
                 "istart": grid.is_,
                 "iend": grid.ie,
@@ -118,7 +91,6 @@ def __init__(
                 "jend": grid.je,
             },
             "cappa": {},
-            "q_con": {},
         }
 
     def compute_from_storage(self, inputs):
diff --git a/tests/savepoint/translate/translate_moistcvpluspt_2d.py b/tests/savepoint/translate/translate_moistcvpluspt_2d.py
index 7c2bcec0..5acd6561 100644
--- a/tests/savepoint/translate/translate_moistcvpluspt_2d.py
+++ b/tests/savepoint/translate/translate_moistcvpluspt_2d.py
@@ -1,7 +1,8 @@
 from gt4py.cartesian.gtscript import PARALLEL, computation, interval
 
 from ndsl import StencilFactory
-from ndsl.dsl.typing import FloatField
+from ndsl.constants import I_DIM, J_DIM, K_DIM
+from ndsl.dsl.typing import Float, FloatField
 from ndsl.stencils.testing import TranslateFortranData2Py, pad_field_in_j
 from pyfv3.stencils import moist_cv
 
@@ -18,10 +19,10 @@ def moist_pt(
     cappa: FloatField,
     delp: FloatField,
     delz: FloatField,
-    r_vir: float,
+    r_vir: Float,
 ):
     with computation(PARALLEL), interval(...):
-        cvm, gz, q_con, cappa, pt = moist_cv.moist_pt_func(
+        cvm, gz, q_con, cappa, pt = moist_cv.moist_pt_func_nwat6(
             qvapor,
             qliquid,
             qrain,
@@ -53,6 +54,12 @@ def __init__(
             domain=(grid.nic, 1, grid.npz),
         )
 
+        self._q_con = grid.quantity_factory.zeros(
+            [I_DIM, J_DIM, K_DIM],
+            units="unknown",
+            dtype=Float,
+        )
+
     def __call__(
         self,
         qvapor: FloatField,
@@ -61,7 +68,7 @@ def __call__(
         qsnow: FloatField,
         qice: FloatField,
         qgraupel: FloatField,
-        q_con: FloatField,
+        # q_con: FloatField,
         pt: FloatField,
         cappa: FloatField,
         delp: FloatField,
@@ -75,7 +82,8 @@ def __call__(
             qsnow,
             qice,
             qgraupel,
-            q_con,
+            # q_con,
+            self._q_con,
             pt,
             cappa,
             delp,
@@ -86,7 +94,7 @@ def __call__(
 
 class TranslateMoistCVPlusPt_2d(TranslateFortranData2Py):
     def __init__(self, grid, namelist, stencil_factory):
-        super().__init__(grid, namelist, stencil_factory)
+        super().__init__(grid, stencil_factory)
         self.stencil_factory = stencil_factory
         self.compute_func = MoistPT(stencil_factory, self.grid)  # type: ignore
         self.in_vars["data_vars"] = {
@@ -98,7 +106,7 @@ def __init__(self, grid, namelist, stencil_factory):
             "qgraupel": {"serialname": "qgraupel_js"},
             "delp": {},
             "delz": {},
-            "q_con": {},
+            # "q_con": {},
             "pt": {},
             "cappa": {},
         }
@@ -111,7 +119,7 @@ def __init__(self, grid, namelist, stencil_factory):
         self.out_vars = {
             "pt": {},
             "cappa": {},
-            "q_con": {},
+            # "q_con": {},
         }
 
     def compute_from_storage(self, inputs):
diff --git a/tests/savepoint/translate/translate_moistcvpluspt_2d_last_step.py b/tests/savepoint/translate/translate_moistcvpluspt_2d_last_step.py
new file mode 100644
index 00000000..7c9b2aa5
--- /dev/null
+++ b/tests/savepoint/translate/translate_moistcvpluspt_2d_last_step.py
@@ -0,0 +1,70 @@
+from ndsl.dsl.typing import Float
+from ndsl.stencils.testing import TranslateFortranData2Py
+from pyfv3.stencils import moist_cv
+
+
+class TranslateMoistCVPlusPt_2d_last_step(TranslateFortranData2Py):
+    def __init__(self, grid, namelist, stencil_factory):
+        super().__init__(grid, namelist, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.in_vars["data_vars"] = {
+            "qvapor": {
+                "kend": grid.npz - 1,
+            },
+            "qliquid": {
+                "kend": grid.npz - 1,
+            },
+            "qice": {
+                "kend": grid.npz - 1,
+            },
+            "qrain": {
+                "kend": grid.npz - 1,
+            },
+            "qsnow": {
+                "kend": grid.npz - 1,
+            },
+            "qgraupel": {
+                "kend": grid.npz - 1,
+            },
+            "pt": {},
+            "pkz": {"istart": grid.is_, "jstart": grid.js},
+        }
+
+        self.in_vars["parameters"] = ["r_vir", "dtmp"]
+        self.out_vars = {
+            "pt": {},
+        }
+
+        self.compute_func = stencil_factory.from_origin_domain(
+            moist_cv.moist_pt_last_step,
+            origin=grid.compute_origin(),
+            domain=(grid.nic, grid.njc, grid.npz),
+        )
+
+        self.quantity_factory = grid.quantity_factory
+
+        self._gz = self.quantity_factory._numpy.zeros(
+            (
+                grid.nid,
+                grid.njd,
+                grid.npz,
+            ),
+            dtype=Float,
+        )
+
+    def compute_from_storage(self, inputs):
+
+        self.compute_func(
+            inputs["qvapor"],
+            inputs["qliquid"],
+            inputs["qrain"],
+            inputs["qsnow"],
+            inputs["qice"],
+            inputs["qgraupel"],
+            # self._gz,
+            inputs["pt"],
+            inputs["pkz"],
+            Float(inputs["dtmp"]),
+            inputs["r_vir"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_moistcvpluste_2d.py b/tests/savepoint/translate/translate_moistcvpluste_2d.py
new file mode 100644
index 00000000..23880cc9
--- /dev/null
+++ b/tests/savepoint/translate/translate_moistcvpluste_2d.py
@@ -0,0 +1,119 @@
+from ndsl.stencils.testing import TranslateFortranData2Py, pad_field_in_j
+from pyfv3.stencils import moist_cv
+
+
+class TranslateMoistCVPlusTe_2d(TranslateFortranData2Py):
+    def __init__(self, grid, namelist, stencil_factory):
+        super().__init__(grid, namelist, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.in_vars["data_vars"] = {
+            "qvapor": {"serialname": "qvapor_js"},
+            "qliquid": {"serialname": "qliquid_js"},
+            "qice": {"serialname": "qice_js"},
+            "qrain": {"serialname": "qrain_js"},
+            "qsnow": {"serialname": "qsnow_js"},
+            "qgraupel": {"serialname": "qgraupel_js"},
+            "delp": {},
+            "pt": {},
+            "phis_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "te_2d_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+            "u": {
+                "istart": grid.isd,
+                "iend": grid.ied,
+                "jstart": grid.jsd,
+                "jend": grid.jed + 1,
+                "kend": grid.npz,
+            },
+            "v": {
+                "istart": grid.isd,
+                "iend": grid.ied + 1,
+                "jstart": grid.jsd,
+                "jend": grid.jed,
+                "kend": grid.npz,
+            },
+            "w": {
+                "kend": grid.npz,
+            },
+            "cosa_s": {
+                "istart": grid.isd,
+                "iend": grid.ied,
+                "jstart": grid.jsd,
+                "jend": grid.jed,
+            },
+            "rsin2": {
+                "istart": grid.isd,
+                "iend": grid.ied,
+                "jstart": grid.jsd,
+                "jend": grid.jed,
+            },
+            "hs": {
+                "istart": grid.isd,
+                "iend": grid.ied,
+                "jstart": grid.jsd,
+                "jend": grid.jed,
+            },
+            "delz": {},
+        }
+        self.write_vars = ["qvapor", "qliquid", "qice", "qrain", "qsnow", "qgraupel"]
+        for k, v in self.in_vars["data_vars"].items():
+            # if k not in self.write_vars:
+            if k in self.write_vars:
+                v["axis"] = 1
+        self.in_vars["parameters"] = ["grav"]
+        self.out_vars = {
+            "te_2d_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+        }
+
+        self.compute_func = stencil_factory.from_origin_domain(
+            moist_cv.moist_te,
+            origin=grid.compute_origin(),
+            domain=(grid.nic, 1, grid.npz + 1),
+        )
+
+    def compute_from_storage(self, inputs):
+        for name, value in inputs.items():
+            if hasattr(value, "shape") and len(value.shape) > 1 and value.shape[1] == 1:
+                inputs[name] = self.make_storage_data(
+                    pad_field_in_j(
+                        value, self.grid.njd, backend=self.stencil_factory.backend
+                    )
+                )
+
+        self.compute_func(
+            inputs["qvapor"],
+            inputs["qliquid"],
+            inputs["qrain"],
+            inputs["qsnow"],
+            inputs["qice"],
+            inputs["qgraupel"],
+            inputs["u"],
+            inputs["v"],
+            inputs["w"],
+            inputs["te_2d_"],
+            inputs["pt"],
+            inputs["phis_"],
+            inputs["delp"],
+            inputs["rsin2"],
+            inputs["cosa_s"],
+            inputs["hs"],
+            inputs["delz"],
+            inputs["grav"],
+        )
+
+        return inputs
diff --git a/tests/savepoint/translate/translate_mpp_global_sum.py b/tests/savepoint/translate/translate_mpp_global_sum.py
new file mode 100644
index 00000000..53b1e4ca
--- /dev/null
+++ b/tests/savepoint/translate/translate_mpp_global_sum.py
@@ -0,0 +1,37 @@
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.stencils.testing import ParallelTranslate
+from ndsl.stencils.testing.grid import Grid
+from ndsl.typing import Communicator
+from pyfv3.mpi.mpp_sum import MPPGlobalSum
+
+
+class TranslateMpp_global_sum(ParallelTranslate):
+    def __init__(
+        self,
+        grid: Grid,
+        namelist: Namelist,
+        stencil_factory: StencilFactory,
+    ):
+        super().__init__(grid, namelist, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+
+        self._base.in_vars["data_vars"] = {
+            "inputArray": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+            "tesum": {},
+        }
+
+        self._base.out_vars = {"tesum": {}}
+
+    def compute_parallel(self, inputs, communicator: Communicator):
+        mpp_sum = MPPGlobalSum(self.stencil_factory, communicator)
+        inputs["tesum"] = mpp_sum(inputs["inputArray"])
+
+        return inputs
diff --git a/tests/savepoint/translate/translate_neg_adj3.py b/tests/savepoint/translate/translate_neg_adj3.py
index 310aeaf0..2015bfc7 100644
--- a/tests/savepoint/translate/translate_neg_adj3.py
+++ b/tests/savepoint/translate/translate_neg_adj3.py
@@ -2,7 +2,6 @@
 
 from f90nml import Namelist
 
-import ndsl.dsl.gt4py_utils as utils
 from ndsl import StencilFactory
 from pyfv3.stencils import AdjustNegativeTracerMixingRatio
 from pyfv3.testing import TranslateDycoreFortranData2Py
@@ -40,8 +39,6 @@ def __init__(
             "qcld": {},
             # "pt": {},
         }
-        for qvar in utils.tracer_variables:
-            self.ignore_near_zero_errors[qvar] = True
         self.stencil_factory = stencil_factory
 
     def compute(self, inputs):
diff --git a/tests/savepoint/translate/translate_pe_pk_delp_peln.py b/tests/savepoint/translate/translate_pe_pk_delp_peln.py
new file mode 100644
index 00000000..e9dc935f
--- /dev/null
+++ b/tests/savepoint/translate/translate_pe_pk_delp_peln.py
@@ -0,0 +1,153 @@
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.remapping import pe_pk_delp_peln
+
+
+class TranslatePE_pk_delp_peln(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "pe2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "pe_": {
+                "istart": grid.is_ - 1,
+                "iend": grid.ie + 1,
+                "jstart": grid.js - 1,
+                "jend": grid.je + 1,
+                "kend": grid.npz + 1,
+            },
+            "peln_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "pn2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "pk2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "delp": {
+                # "istart": grid.isd,
+                # "iend": grid.ied,
+                # "jstart": grid.jsd,
+                # "jend": grid.jed,
+                # "kend": grid.npz,
+            },
+            "pk": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "ak": {},
+            "bk": {},
+        }
+        self.in_vars["parameters"] = [
+            "akap",
+            "ptop",
+        ]
+
+        self.out_vars = {
+            "pe2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "pe_": {
+                "istart": grid.is_ - 1,
+                "iend": grid.ie + 1,
+                "jstart": grid.js - 1,
+                "jend": grid.je + 1,
+                "kend": grid.npz + 1,
+            },
+            "peln_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "pn2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "pk2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "delp": {
+                # "istart": grid.isd,
+                # "iend": grid.ied,
+                # "jstart": grid.jsd,
+                # "jend": grid.jed,
+                # "kend": grid.npz,
+            },
+            "pk": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+        }
+
+        grid_indexing = stencil_factory.grid_indexing
+        self._domain_kextra = (
+            grid_indexing.domain[0],
+            1,
+            grid_indexing.domain[2] + 1,
+        )
+
+        self._pe_pk_delp_peln = stencil_factory.from_origin_domain(
+            pe_pk_delp_peln,
+            origin=grid_indexing.origin_compute(),
+            domain=self._domain_kextra,
+        )
+
+    def compute_from_storage(self, inputs):
+        self._pe_pk_delp_peln(
+            inputs["pe_"],
+            inputs["pk"],
+            inputs["delp"],
+            inputs["peln_"],
+            inputs["pe2_"],
+            inputs["pk2_"],
+            inputs["pn2_"],
+            inputs["ak"],
+            inputs["bk"],
+            inputs["akap"],
+            inputs["ptop"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_pn2_pk_delp.py b/tests/savepoint/translate/translate_pn2_pk_delp.py
new file mode 100644
index 00000000..fe8eb041
--- /dev/null
+++ b/tests/savepoint/translate/translate_pn2_pk_delp.py
@@ -0,0 +1,131 @@
+from ndsl import StencilFactory
+from ndsl.dsl.typing import Float, FloatField
+from ndsl.stencils.testing import TranslateFortranData2Py
+from pyfv3.stencils.remapping import pn2_pk_delp
+
+
+class testClass:
+    """
+    Class to test with DaCe orchestration. test class is MoistCVPlusPt_2d
+    """
+
+    def __init__(
+        self,
+        stencil_factory: StencilFactory,
+        grid,
+    ):
+        self._pn2_pk_delp = stencil_factory.from_origin_domain(
+            func=pn2_pk_delp,
+            origin=(3, 3, 1),
+            domain=(24, 24, 71),
+        )
+
+    def __call__(
+        self,
+        dp2: FloatField,
+        delp: FloatField,
+        pe2: FloatField,
+        pn2: FloatField,
+        pk: FloatField,
+        akap: Float,
+    ):
+        self._pn2_pk_delp(dp2, delp, pe2, pn2, pk, akap)
+
+
+class TranslatePN2_PK_DelP(TranslateFortranData2Py):
+    def __init__(self, grid, namelist, stencil_factory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.compute_func = testClass(self.stencil_factory, self.grid)  # type: ignore
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "pe2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "pn2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "pk_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+        }
+        self.in_vars["parameters"] = [
+            "akap",
+        ]
+
+        self.out_vars = {
+            "pe2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "pn2_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "pk_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+        }
+        self._dp2 = self.quantity_factory._numpy.zeros(
+            (
+                31,
+                31,
+                73,
+            ),
+            dtype=Float,
+        )
+
+        self._delp = self.quantity_factory._numpy.zeros(
+            (
+                31,
+                31,
+                73,
+            ),
+            dtype=Float,
+        )
+
+    def compute_from_storage(self, inputs):
+
+        # print("delp shape = ", self._delp.shape)
+        # print("inputs[pe2] shape = ", inputs["pe2_"].shape)
+        # print("inputs[pe2_][:,3,0] = ", inputs["pe2_"][:,3,0])
+        # print('self.grid.is_ = ', self.grid.is_)
+        # print('self.grid.ie = ', self.grid.ie)
+        # print('self.grid.js = ', self.grid.js)
+        # print('self.grid.je = ', self.grid.je)
+        # print('self.storage_vars() = ', self.storage_vars())
+        # self.make_storage_data_input_vars(inputs)
+        # exit(1)
+        self.compute_func(
+            self._dp2,
+            self._delp,
+            inputs["pe2_"],
+            inputs["pn2_"],
+            inputs["pk_"],
+            inputs["akap"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_pressureadjustedtemperature_nonhydrostatic.py b/tests/savepoint/translate/translate_pressureadjustedtemperature_nonhydrostatic.py
index 139924b5..816c10ea 100644
--- a/tests/savepoint/translate/translate_pressureadjustedtemperature_nonhydrostatic.py
+++ b/tests/savepoint/translate/translate_pressureadjustedtemperature_nonhydrostatic.py
@@ -1,8 +1,10 @@
 from typing import Any, Dict
 
+import numpy as np
 from f90nml import Namelist
 
 from ndsl import StencilFactory
+from ndsl.dsl.typing import Float
 from pyfv3.stencils import temperature_adjust
 from pyfv3.stencils.dyn_core import get_nk_heat_dissipation
 from pyfv3.testing import TranslateDycoreFortranData2Py
@@ -41,7 +43,9 @@ def __init__(
         self.stencil_factory = stencil_factory
 
     def compute_from_storage(self, inputs):
-        inputs["delt_time_factor"] = abs(inputs["bdt"] * self.config.delt_max)
+        inputs["delt_time_factor"] = np.abs(
+            inputs["bdt"] * self.config.delt_max, dtype=Float
+        )
         del inputs["bdt"]
         self.compute_func(**inputs)
         return inputs
diff --git a/tests/savepoint/translate/translate_remapping.py b/tests/savepoint/translate/translate_remapping.py
index 3604fd6b..f4a2e354 100644
--- a/tests/savepoint/translate/translate_remapping.py
+++ b/tests/savepoint/translate/translate_remapping.py
@@ -2,10 +2,11 @@
 
 import ndsl.dsl.gt4py_utils as utils
 from ndsl import StencilFactory
-from ndsl.constants import K_DIM
+from ndsl.constants import I_DIM, J_DIM, K_DIM
 from ndsl.stencils.testing import Grid
 from pyfv3.stencils import LagrangianToEulerian
 from pyfv3.testing import TranslateDycoreFortranData2Py
+from pyfv3.tracers import FVTracersAxisName, default_ai2_tracers
 
 
 class TranslateRemapping(TranslateDycoreFortranData2Py):
@@ -98,25 +99,38 @@ def __init__(
         self.near_zero = 3e-18
         self.ignore_near_zero_errors = {"q_con": True, "tracers": True}
         self.stencil_factory = stencil_factory
+        self.quantity_factory = grid.quantity_factory
 
     def compute_from_storage(self, inputs):
+        default_ai2_tracers(self.quantity_factory)
         wsd_2d = utils.make_storage_from_shape(
             inputs["wsd"].shape[0:2], backend=self.stencil_factory.backend
         )
         wsd_2d[:, :] = inputs["wsd"][:, :, 0]
         inputs["wsd"] = wsd_2d
-        inputs["q_cld"] = inputs["tracers"]["qcld"]
         inputs["last_step"] = bool(inputs["last_step"])
-        pfull = self.grid.quantity_factory.zeros([K_DIM], units="Pa")
-        pfull.data[:] = pfull.np.asarray(inputs.pop("pfull"))
+        pfull = self.quantity_factory.zeros([K_DIM], units="Pa")
+        pfull[:] = pfull.np.asarray(inputs.pop("pfull"))
+
+        # Tracers
+        quantity_tracers = self.quantity_factory.from_array(
+            inputs["tracers"], [I_DIM, J_DIM, K_DIM, FVTracersAxisName], "n/a"
+        )
+        inputs["tracers"] = quantity_tracers
+
         lagrangian_to_eulerian = LagrangianToEulerian(
             self.stencil_factory,
-            quantity_factory=self.grid.quantity_factory,
+            quantity_factory=self.quantity_factory,
             config=self.config.remapping,
             area_64=self.grid.area_64,
-            nq=inputs.pop("nq"),
             pfull=pfull,
+            nwat=self.config.nwat,
         )
+
         lagrangian_to_eulerian(**inputs)
-        inputs.pop("q_cld")
+
+        if not self.stencil_factory.backend.is_fortran_aligned():
+            inputs["tracers"] = quantity_tracers[:-1, :-1, :-1, :]
+        else:
+            inputs["tracers"] = quantity_tracers.data
         return inputs
diff --git a/tests/savepoint/translate/translate_remapping_GEOS.py b/tests/savepoint/translate/translate_remapping_GEOS.py
new file mode 100644
index 00000000..c7d06a1d
--- /dev/null
+++ b/tests/savepoint/translate/translate_remapping_GEOS.py
@@ -0,0 +1,485 @@
+from types import SimpleNamespace
+
+from f90nml import Namelist
+
+from ndsl import Quantity, StencilFactory
+from ndsl.constants import (
+    I_DIM,
+    I_INTERFACE_DIM,
+    J_DIM,
+    J_INTERFACE_DIM,
+    K_DIM,
+    K_INTERFACE_DIM,
+)
+from ndsl.dsl.typing import Float
+from ndsl.stencils.testing import Grid, ParallelTranslateBaseSlicing
+from pyfv3 import DynamicalCoreConfig
+from pyfv3.stencils.remapping_GEOS import LagrangianToEulerian_GEOS
+from pyfv3.tracers import FVTracers, FVTracersAxisName, setup_fvtracers
+
+
+class TranslateRemapping_GEOS(ParallelTranslateBaseSlicing):
+    inputs = {
+        "pe": {
+            "name": "pe",
+            "dims": [I_DIM, J_DIM, K_INTERFACE_DIM],
+            "units": "No Units",
+        },
+        "delp": {
+            "name": "delp",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "delz": {
+            "name": "delz",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "q_con": {
+            "name": "q_con",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "pt": {
+            "name": "pt",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "cappa": {
+            "name": "cappa",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "ps": {
+            "name": "ps",
+            "dims": [I_DIM, J_DIM],
+            "units": "No Units",
+        },
+        "peln": {
+            "name": "peln",
+            "dims": [I_DIM, J_DIM, K_INTERFACE_DIM],
+            "units": "No Units",
+        },
+        "ak": {
+            "name": "ak",
+            "dims": [K_INTERFACE_DIM],
+            "units": "No Units",
+        },
+        "bk": {
+            "name": "bk",
+            "dims": [K_INTERFACE_DIM],
+            "units": "No Units",
+        },
+        "pk": {
+            "name": "pk",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "pkz": {
+            "name": "pkz",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "w": {
+            "name": "w",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "u": {
+            "name": "u",
+            "dims": [I_DIM, J_INTERFACE_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "v": {
+            "name": "v",
+            "dims": [I_INTERFACE_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "mfy_R4": {
+            "name": "mfy",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "cy_R4": {
+            "name": "cy",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "mfx_R4": {
+            "name": "mfx",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "cx_R4": {
+            "name": "cx",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "phis": {
+            "name": "phis",
+            "dims": [I_DIM, J_DIM],
+            "units": "No Units",
+        },
+        "te_2d": {
+            "name": "te_2d",
+            "dims": [I_DIM, J_DIM],
+            "units": "No Units",
+        },
+        "wsd": {
+            "name": "wsd",
+            "dims": [I_DIM, J_DIM],
+            "units": "No Units",
+        },
+        "dp1": {
+            "name": "dp1",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "pfull": {
+            "name": "pfull",
+            "dims": [K_DIM],
+            "units": "No Units",
+        },
+    }
+    outputs = {
+        "pt": {
+            "name": "pt",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "cappa": {
+            "name": "cappa",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "delp": {
+            "name": "delp",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "delz": {
+            "name": "delz",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "w": {
+            "name": "w",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "u": {
+            "name": "u",
+            "dims": [I_DIM, J_INTERFACE_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "v": {
+            "name": "v",
+            "dims": [I_INTERFACE_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "mfy_R4": {
+            "name": "mfy",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "cy_R4": {
+            "name": "cy",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "mfx_R4": {
+            "name": "mfx",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "cx_R4": {
+            "name": "cx",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "peln": {
+            "name": "peln",
+            "dims": [I_DIM, J_DIM, K_INTERFACE_DIM],
+            "units": "No Units",
+        },
+        "pe": {
+            "name": "pe",
+            "dims": [I_DIM, J_DIM, K_INTERFACE_DIM],
+            "units": "No Units",
+        },
+        "pk": {
+            "name": "pk",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "pkz": {
+            "name": "pkz",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "q_con": {
+            "name": "q_con",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "dp1": {
+            "name": "dp1",
+            "dims": [I_DIM, J_DIM, K_DIM],
+            "units": "No Units",
+        },
+        "ps": {
+            "name": "ps",
+            "dims": [I_DIM, J_DIM],
+            "units": "No Units",
+        },
+    }
+
+    def __init__(
+        self,
+        grid: Grid,
+        namelist: Namelist,
+        stencil_factory: StencilFactory,
+    ):
+        super().__init__(grid, namelist, stencil_factory)
+
+        self._base.in_vars["data_vars"] = {
+            "tracers": {},
+            "w": {
+                "kend": grid.npz - 1,
+            },
+            "u": grid.y3d_domain_dict(),
+            "v": grid.x3d_domain_dict(),
+            "delz": {},
+            "pt": {},
+            "dp1": {},
+            "delp": {},
+            "cappa": {},
+            "q_con": {},
+            "pkz": grid.compute_dict(),
+            "pk": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz + 1,
+            },
+            "peln": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kaxis": 1,
+                "kend": grid.npz,
+            },
+            "pe": {
+                "istart": grid.is_ - 1,
+                "iend": grid.ie + 1,
+                "jstart": grid.js - 1,
+                "jend": grid.je + 1,
+                "kend": grid.npz + 1,
+                "kaxis": 1,
+            },
+            "ps": {},
+            "wsd": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+            "mfy_R4": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz - 1,
+            },
+            "cy_R4": {
+                "istart": grid.isd,
+                "iend": grid.ied,
+                "jstart": grid.js,
+                "jend": grid.je + 1,
+                "kend": grid.npz - 1,
+            },
+            "mfx_R4": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "cx_R4": {
+                "istart": grid.is_,
+                "iend": grid.ie + 1,
+                "jstart": grid.jsd,
+                "jend": grid.jed,
+                "kend": grid.npz - 1,
+            },
+            "phis": {
+                "istart": grid.isd,
+                "iend": grid.ied,
+                "jstart": grid.jsd,
+                "jend": grid.jed,
+            },
+            "te_2d": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+            # column variables...
+            "ak": {},
+            "bk": {},
+            "pfull": grid.compute_buffer_k_dict(),
+        }
+        self._base.in_vars["parameters"] = [
+            "ptop",
+            "akap",
+            "zvir",
+            "last_step",
+            "consv_te",
+            "mdt",
+            "nq",
+        ]
+        self._base.out_vars = {}
+        for k in [
+            "tracers",
+            "pe",
+            "pkz",
+            "pk",
+            "peln",
+            "pt",
+            "cappa",
+            "delp",
+            "delz",
+            "q_con",
+            "u",
+            "v",
+            "w",
+            "ps",
+            "dp1",
+            "mfy_R4",
+            "cy_R4",
+            "mfx_R4",
+            "cx_R4",
+        ]:
+            self._base.out_vars[k] = self._base.in_vars["data_vars"][k]
+
+        self.stencil_factory = stencil_factory
+        self.quantity_factory = grid.quantity_factory
+
+        self.stencil_factory = stencil_factory
+        self.config = DynamicalCoreConfig.from_f90nml(namelist)
+        self.grid = grid
+
+        self._are_tracers_setup = False
+
+        self._tracers = None
+
+    def compute_sequential(self, inputs_list, communicator_list):
+        print("No serial test available")
+
+    def state_from_inputs_and_tracers(
+        self, inputs: dict, tracers: FVTracers
+    ) -> SimpleNamespace:
+        input_storages = super().state_from_inputs(inputs)
+        # Rename fluxes and courant numbers
+        input_storages["mfx"] = input_storages.pop("mfx_R4")
+        input_storages["mfy"] = input_storages.pop("mfy_R4")
+        input_storages["cx"] = input_storages.pop("cx_R4")
+        input_storages["cy"] = input_storages.pop("cy_R4")
+        # Make tracers
+        input_storages["tracers"] = tracers
+        return SimpleNamespace(**input_storages)
+
+    def outputs_from_state(self, state: dict):
+        if len(self.outputs) == 0:
+            return {}
+        outputs = {}
+        storages = {}
+        for name, _properties in self.outputs.items():
+            if name in ["mfx_R4", "mfy_R4", "cx_R4", "cy_R4"]:
+                storages[name] = state[name[:-3]]
+            elif isinstance(state[name], Quantity):
+                storages[name] = state[name].data
+            elif len(self.outputs[name]["dims"]) > 0:
+                storages[name] = state[name]  # assume it's a storage
+            else:
+                outputs[name] = state[name]  # scalar
+        # Put tracers
+        storages["tracers"] = state["tracers"][:-1, :-1, :-1, :]
+        outputs.update(self._base.slice_output(storages))
+        return outputs
+
+    def compute_parallel(self, inputs, communicator):
+        if not self._are_tracers_setup:
+            self._are_tracers_setup = True
+            setup_fvtracers(
+                self.quantity_factory,
+                inputs["tracers"].shape[3],
+                {
+                    "vapor": 0,
+                    "liquid": 1,
+                    "rain": 3,
+                    "snow": 4,
+                    "ice": 2,
+                    "graupel": 5,
+                    "cloud": 6,
+                },
+            )
+
+        self._tracers = self.quantity_factory.empty(
+            [I_DIM, J_DIM, K_DIM, FVTracersAxisName], ""
+        )
+        self._tracers[:-1, :-1, :-1, :] = inputs["tracers"][:]
+        inputs.pop("tracers")
+        self._base.in_vars["data_vars"].pop("tracers")
+
+        inputs["te_2d"] = inputs["te_2d"].astype(Float)
+        state = self.state_from_inputs_and_tracers(inputs, self._tracers)
+
+        l_to_e = LagrangianToEulerian_GEOS(
+            self.stencil_factory,
+            self.quantity_factory,
+            self.config.remapping,
+            communicator,
+            self.grid.grid_data,
+            state.pfull,
+            self.config.adiabatic,
+            self.config.nwat,
+        )
+
+        l_to_e(
+            state.tracers,
+            state.pt,
+            state.delp,
+            state.delz,
+            state.peln,
+            state.u,
+            state.v,
+            state.w,
+            state.mfx,
+            state.mfy,
+            state.cx,
+            state.cy,
+            state.cappa,
+            state.q_con,
+            state.pkz,
+            state.pk,
+            state.pe,
+            state.phis,
+            state.te_2d,
+            state.ps,
+            state.wsd,
+            state.ak,
+            state.bk,
+            state.dp1,
+            state.ptop,
+            state.akap,
+            state.zvir,
+            state.last_step,
+            state.consv_te,
+            state.mdt,
+        )
+
+        outputs = self.outputs_from_state(vars(state))
+        return outputs
diff --git a/tests/savepoint/translate/translate_riem_solver_c.py b/tests/savepoint/translate/translate_riem_solver_c.py
index a8faf4f5..80ca5b01 100644
--- a/tests/savepoint/translate/translate_riem_solver_c.py
+++ b/tests/savepoint/translate/translate_riem_solver_c.py
@@ -1,8 +1,11 @@
+import numpy as np
 from f90nml import Namelist
 
 from ndsl import StencilFactory
+from ndsl.constants import I_DIM, J_DIM, K_DIM
 from pyfv3.stencils import NonhydrostaticVerticalSolverCGrid
 from pyfv3.testing import TranslateDycoreFortranData2Py
+from pyfv3.utils.functional_validation import get_subset_func
 
 
 class TranslateRiem_Solver_C(TranslateDycoreFortranData2Py):
@@ -33,3 +36,23 @@ def __init__(
         self.out_vars = {"pef": {"kend": grid.npz}, "gz": {"kend": grid.npz}}
         self.max_error = 5e-14
         self.stencil_factory = stencil_factory
+        self._subset = get_subset_func(
+            self.grid.grid_indexing,
+            dims=[I_DIM, J_DIM, K_DIM],
+            n_halo=((3, 3), (3, 3)),
+        )
+
+    def compute(self, inputs):
+        outputs = super().compute(inputs)
+        outputs["gz"] = self.subset_output("gz", outputs["gz"])
+        return outputs
+
+    def subset_output(self, varname: str, output: np.ndarray) -> np.ndarray:
+        """
+        Given an output array, return the slice of the array which we'd
+        like to validate against reference data
+        """
+        if varname in ["gz", "pef"]:
+            return self._subset(output)
+        else:
+            return output
diff --git a/tests/savepoint/translate/translate_satadjust3d.py b/tests/savepoint/translate/translate_satadjust3d.py
index 787e0bcb..16823da2 100644
--- a/tests/savepoint/translate/translate_satadjust3d.py
+++ b/tests/savepoint/translate/translate_satadjust3d.py
@@ -71,6 +71,7 @@ def compute_from_storage(self, inputs):
             self.config.sat_adjust,
             self.grid.area_64,
             int(inputs["kmp"]),
+            nwat=self.config.nwat,
         )
         satadjust3d_obj(**inputs)
         return inputs
diff --git a/tests/savepoint/translate/translate_scalar_profile.py b/tests/savepoint/translate/translate_scalar_profile.py
new file mode 100644
index 00000000..269aadf4
--- /dev/null
+++ b/tests/savepoint/translate/translate_scalar_profile.py
@@ -0,0 +1,120 @@
+from f90nml import Namelist
+
+from ndsl import StencilFactory
+from ndsl.constants import I_DIM, J_DIM, K_DIM
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.remap_profile import RemapProfile
+
+
+class TranslateScalar_Profile(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.quantity_factory = grid.quantity_factory
+
+        self.in_vars["data_vars"] = {
+            "qs_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_1": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_2": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_3": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_4": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "dp1_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+        }
+        self.in_vars["parameters"] = [
+            "q_min",
+        ]
+
+        self.out_vars = {
+            "q4_1": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_2": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_3": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+            "q4_4": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz - 1,
+            },
+        }
+
+        # Value from GEOS
+        self.kord = 9
+
+        # mode / iv set to 1 from GEOS
+        self.mode = 1
+
+        self._compute_func = RemapProfile(
+            self.stencil_factory,
+            self.quantity_factory,
+            self.kord,
+            self.mode,
+            dims=[I_DIM, J_DIM, K_DIM],
+        )
+
+    def compute_from_storage(self, inputs):
+        self._compute_func(
+            inputs["qs_"],
+            inputs["q4_1"],
+            inputs["q4_2"],
+            inputs["q4_3"],
+            inputs["q4_4"],
+            inputs["dp1_"],
+            inputs["q_min"],
+        )
+        return inputs
diff --git a/tests/savepoint/translate/translate_te_zsum.py b/tests/savepoint/translate/translate_te_zsum.py
new file mode 100644
index 00000000..9b7a3191
--- /dev/null
+++ b/tests/savepoint/translate/translate_te_zsum.py
@@ -0,0 +1,70 @@
+from ndsl.stencils.testing import TranslateFortranData2Py
+from pyfv3.stencils import moist_cv
+
+
+class TranslateTe_Zsum(TranslateFortranData2Py):
+    def __init__(self, grid, namelist, stencil_factory):
+        super().__init__(grid, namelist, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.in_vars["data_vars"] = {
+            "delp": {
+                "kend": grid.npz,
+            },
+            "te_2d_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+            "te0_2d_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+            "zsum1": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+            "pkz": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+                "kend": grid.npz,
+            },
+        }
+        self.out_vars = {
+            "te_2d_": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+            "zsum1": {
+                "istart": grid.is_,
+                "iend": grid.ie,
+                "jstart": grid.js,
+                "jend": grid.je,
+            },
+        }
+
+        self.compute_func = stencil_factory.from_origin_domain(
+            moist_cv.te_zsum,
+            origin=grid.compute_origin(),
+            domain=(grid.nic, 1, grid.npz),
+        )
+
+    def compute_from_storage(self, inputs):
+
+        self.compute_func(
+            inputs["te_2d_"],
+            inputs["te0_2d_"],
+            inputs["delp"],
+            inputs["pkz"],
+            inputs["zsum1"],
+        )
+
+        return inputs
diff --git a/tests/savepoint/translate/translate_tracer2d1l.py b/tests/savepoint/translate/translate_tracer2d1l.py
index 6bb420ce..a5093f6d 100644
--- a/tests/savepoint/translate/translate_tracer2d1l.py
+++ b/tests/savepoint/translate/translate_tracer2d1l.py
@@ -1,12 +1,12 @@
 import pytest
 from f90nml import Namelist
 
-import ndsl.dsl.gt4py_utils as utils
-from ndsl import StencilFactory
+from ndsl import QuantityFactory, StencilFactory
 from ndsl.constants import I_DIM, J_DIM, K_DIM
-from ndsl.stencils.testing import ParallelTranslate
+from ndsl.stencils.testing import Grid, ParallelTranslate
 from pyfv3 import DynamicalCoreConfig
 from pyfv3.stencils import FiniteVolumeTransport, TracerAdvection
+from pyfv3.tracers import FVTracersAxisName, GEOS_tracers_mapping, setup_fvtracers
 from pyfv3.utils.functional_validation import get_subset_func
 
 
@@ -20,7 +20,7 @@ class TranslateTracer2D1L(ParallelTranslate):
 
     def __init__(
         self,
-        grid,
+        grid: Grid,
         namelist: Namelist,
         stencil_factory: StencilFactory,
     ):
@@ -28,31 +28,38 @@ def __init__(
         self._base.in_vars["data_vars"] = {
             "tracers": {},
             "dp1": {},
-            "mfxd": grid.x3d_compute_dict(),
-            "mfyd": grid.y3d_compute_dict(),
-            "cxd": grid.x3d_compute_domain_y_dict(),
-            "cyd": grid.y3d_compute_domain_x_dict(),
+            "mfxd_R4": grid.x3d_compute_dict(),
+            "mfyd_R4": grid.y3d_compute_dict(),
+            "cxd_R4": grid.x3d_compute_domain_y_dict(),
+            "cyd_R4": grid.y3d_compute_domain_x_dict(),
         }
         self._base.in_vars["parameters"] = ["nq"]
         self._base.out_vars = self._base.in_vars["data_vars"]
         self.stencil_factory = stencil_factory
+        self._quantity_factory = QuantityFactory(
+            sizer=grid.sizer,
+            backend=stencil_factory.backend,
+        )
         self._subset = get_subset_func(
             self.grid.grid_indexing,
             dims=[I_DIM, J_DIM, K_DIM],
             n_halo=((0, 0), (0, 0)),
         )
         self.config = DynamicalCoreConfig.from_f90nml(namelist)
-
-    def collect_input_data(self, serializer, savepoint):
-        input_data = self._base.collect_input_data(serializer, savepoint)
-        return input_data
+        self.quantity_factory = grid.quantity_factory
 
     def compute_parallel(self, inputs, communicator):
         self._base.make_storage_data_input_vars(inputs)
-        all_tracers = inputs["tracers"]
-        inputs["tracers"] = self.get_advected_tracer_dict(
-            inputs["tracers"], int(inputs.pop("nq"))
+        setup_fvtracers(
+            self.quantity_factory, inputs["tracers"].shape[3], GEOS_tracers_mapping
+        )
+
+        quantity_tracers = self.grid.quantity_factory.from_array(
+            inputs["tracers"], [I_DIM, J_DIM, K_DIM, FVTracersAxisName], "n/a"
         )
+        inputs["tracers"] = quantity_tracers
+        nq = int(inputs.pop("nq"))
+
         transport = FiniteVolumeTransport(
             stencil_factory=self.stencil_factory,
             quantity_factory=self.grid.quantity_factory,
@@ -69,38 +76,23 @@ def compute_parallel(self, inputs, communicator):
             self.grid.grid_data,
             communicator,
             inputs["tracers"],
+            nq,
         )
-        inputs["x_mass_flux"] = inputs.pop("mfxd")
-        inputs["y_mass_flux"] = inputs.pop("mfyd")
-        inputs["x_courant"] = inputs.pop("cxd")
-        inputs["y_courant"] = inputs.pop("cyd")
+        inputs["x_mass_flux"] = inputs.pop("mfxd_R4")
+        inputs["y_mass_flux"] = inputs.pop("mfyd_R4")
+        inputs["x_courant"] = inputs.pop("cxd_R4")
+        inputs["y_courant"] = inputs.pop("cyd_R4")
         self.tracer_advection(**inputs)
-        inputs["mfxd"] = inputs.pop("x_mass_flux")
-        inputs["mfyd"] = inputs.pop("y_mass_flux")
-        inputs["cxd"] = inputs.pop("x_courant")
-        inputs["cyd"] = inputs.pop("y_courant")
-        inputs["tracers"] = (
-            all_tracers  # some aren't advected, still need to be validated
-        )
-        # need to convert tracers dict to [x, y, z, n_tracer] array before subsetting
+        inputs["mfxd_R4"] = inputs.pop("x_mass_flux")
+        inputs["mfyd_R4"] = inputs.pop("y_mass_flux")
+        inputs["cxd_R4"] = inputs.pop("x_courant")
+        inputs["cyd_R4"] = inputs.pop("y_courant")
+        inputs["tracers"] = quantity_tracers.field[:]
+
         outputs = self._base.slice_output(inputs)
-        outputs["tracers"] = self.subset_output("tracers", outputs["tracers"])
         return outputs
 
-    def get_advected_tracer_dict(self, all_tracers, nq):
-        all_tracers = {**all_tracers}  # make a new dict so we don't modify the input
-        properties = self.inputs["tracers"]
-        for name in utils.tracer_variables:
-            self.grid.quantity_dict_update(
-                all_tracers,
-                name,
-                dims=properties["dims"],
-                units=properties["units"],
-            )
-        tracer_names = utils.tracer_variables[:nq]
-        return {name: all_tracers[name + "_quantity"] for name in tracer_names}
-
-    def compute_sequential(self, a, b):
+    def compute_sequential(self, inputs_list, communicator_list):
         pytest.skip(
             f"{self.__class__} only has a mpirun implementation, not running in mock-parallel"
         )
diff --git a/tests/savepoint/translate/translate_tracer2d1l_cmax.py b/tests/savepoint/translate/translate_tracer2d1l_cmax.py
new file mode 100644
index 00000000..efc9479d
--- /dev/null
+++ b/tests/savepoint/translate/translate_tracer2d1l_cmax.py
@@ -0,0 +1,77 @@
+from f90nml import Namelist
+
+from ndsl import QuantityFactory, StencilFactory
+from ndsl.constants import I_DIM, I_INTERFACE_DIM, J_DIM, J_INTERFACE_DIM, K_DIM
+from ndsl.stencils.testing import ParallelTranslate2Py
+from pyfv3.stencils.tracer_2d_1l import TracerCMax
+
+
+class TranslateTracerCMax(ParallelTranslate2Py):
+    inputs = {
+        "cx_R4": {
+            "name": "cx_R4",
+            "dims": [I_INTERFACE_DIM, J_DIM, K_DIM],
+            "units": "unitless",
+        },
+        "cy_R4": {
+            "name": "cy_R4",
+            "dims": [I_DIM, J_INTERFACE_DIM, K_DIM],
+            "units": "unitless",
+        },
+        "cmax": {
+            "name": "cmaxgrid",
+            "dims": [K_DIM],
+            "units": "unitless",
+        },
+    }
+
+    def __init__(
+        self,
+        grid,
+        namelist: Namelist,
+        stencil_factory: StencilFactory,
+    ):
+        super().__init__(grid, namelist, stencil_factory)
+        self._base.in_vars["data_vars"] = {
+            "cx_R4": grid.x3d_compute_domain_y_dict(),
+            "cy_R4": grid.y3d_compute_domain_x_dict(),
+            "cmax": {},
+        }
+        self._base.out_vars = {
+            "cmax": {},
+        }
+        self._stencil_factory = stencil_factory
+        self._grid_data = grid
+        self._quantity_factory = QuantityFactory(
+            grid.sizer,
+            backend=stencil_factory.backend,
+        )
+
+    def compute_parallel(self, inputs, communicator):
+        self._base.make_storage_data_input_vars(inputs)
+        tracer_cmax = TracerCMax(
+            stencil_factory=self._stencil_factory,
+            quantity_factory=self._quantity_factory,
+            grid_data=self._grid_data,
+            comm=communicator,
+        )
+        cx_quantity = self._quantity_factory.from_array(
+            inputs["cx_R4"], self.inputs["cx_R4"]["dims"], ""
+        )
+        cy_quantity = self._quantity_factory.from_array(
+            inputs["cy_R4"],
+            self.inputs["cy_R4"]["dims"],
+            "",
+        )
+        cmax_quantity = self._quantity_factory.from_array(
+            inputs["cmax"],
+            self.inputs["cmax"]["dims"],
+            "",
+        )
+        tracer_cmax(
+            cx=cx_quantity,
+            cy=cy_quantity,
+            cmax=cmax_quantity,
+        )
+        inputs["cmax"] = cmax_quantity[:]
+        return self._base.slice_output(inputs)
diff --git a/tests/savepoint/translate/translate_updatedzc.py b/tests/savepoint/translate/translate_updatedzc.py
index f8d61a1a..b81516df 100644
--- a/tests/savepoint/translate/translate_updatedzc.py
+++ b/tests/savepoint/translate/translate_updatedzc.py
@@ -17,15 +17,17 @@ def __init__(
     ):
         super().__init__(grid, namelist, stencil_factory)
         self.stencil_factory = stencil_factory
-        update_gz_on_c_grid = UpdateGeopotentialHeightOnCGrid(
-            self.stencil_factory,
-            quantity_factory=self.grid.quantity_factory,
-            area=grid.grid_data.area,
-            dp_ref=grid.grid_data.dp_ref,
-            grid_type=self.config.grid_type,
-        )
 
         def compute(**kwargs):
+            update_gz_on_c_grid = UpdateGeopotentialHeightOnCGrid(
+                self.stencil_factory,
+                quantity_factory=self.grid.quantity_factory,
+                area=grid.grid_data.area,
+                dp_ref=grid.grid_data.dp_ref,
+                grid_type=self.config.grid_type,
+                dz_min=kwargs.pop("dz_min"),
+            )
+
             kwargs["dt"] = kwargs.pop("dt2")
             update_gz_on_c_grid(**kwargs)
 
@@ -37,7 +39,7 @@ def compute(**kwargs):
             "gz": {},
             "ws": {},
         }
-        self.in_vars["parameters"] = ["dt2"]
+        self.in_vars["parameters"] = ["dt2", "dz_min"]
         self.out_vars = {
             "gz": grid.default_buffer_k_dict(),
             "ws": {"kstart": -1, "kend": None},
@@ -45,12 +47,12 @@ def compute(**kwargs):
         self._subset = get_subset_func(
             self.grid.grid_indexing,
             dims=[I_DIM, J_DIM, K_DIM],
-            n_halo=((0, 0), (0, 0)),
+            n_halo=((3, 3), (3, 3)),
         )
         self._subset_2d = get_subset_func(
             self.grid.grid_indexing,
             dims=[I_DIM, J_DIM],
-            n_halo=((0, 0), (0, 0)),
+            n_halo=((3, 3), (3, 3)),
         )
 
     def compute(self, inputs):
diff --git a/tests/savepoint/translate/translate_updatedzd.py b/tests/savepoint/translate/translate_updatedzd.py
index 1c803149..92d74766 100644
--- a/tests/savepoint/translate/translate_updatedzd.py
+++ b/tests/savepoint/translate/translate_updatedzd.py
@@ -51,7 +51,7 @@ def __init__(
         self._subset = get_subset_func(
             self.grid.grid_indexing,
             dims=[I_DIM, J_DIM, K_DIM],
-            n_halo=((0, 0), (0, 0)),
+            n_halo=((3, 3), (3, 3)),
         )
         self.ignore_near_zero_errors = {"zh": True, "wsd": True}
         self.near_zero = 1e-30
@@ -65,8 +65,10 @@ def compute(self, inputs):
             self.grid.grid_data,
             self.grid.grid_type,
             self.config.hord_tm,
+            dz_min=self.config.acoustic_dynamics.dz_min,
             column_namelist=d_sw.get_column_namelist(
-                self.config, quantity_factory=self.grid.quantity_factory
+                self.config.d_grid_shallow_water,
+                quantity_factory=self.grid.quantity_factory,
             ),
         )
         self.updatedzd(**inputs)
diff --git a/tests/savepoint/translate/translate_w_fix_consrv_moment.py b/tests/savepoint/translate/translate_w_fix_consrv_moment.py
new file mode 100644
index 00000000..2784d366
--- /dev/null
+++ b/tests/savepoint/translate/translate_w_fix_consrv_moment.py
@@ -0,0 +1,71 @@
+from ndsl.dsl.typing import Float
+from ndsl.stencils.testing import TranslateFortranData2Py
+from ndsl.stencils.testing.grid import Grid
+from pyfv3.stencils.w_fix_consrv_moment import W_fix_consrv_moment
+
+
+class TranslateW_fix_consrv_moment(TranslateFortranData2Py):
+    def __init__(self, grid: Grid, namelist, stencil_factory):
+        super().__init__(grid, stencil_factory)
+        self.stencil_factory = stencil_factory
+        self.grid = grid
+        self.quantity_factory = grid.quantity_factory
+
+        self.compute_func = stencil_factory.from_origin_domain(
+            func=W_fix_consrv_moment,
+            origin=grid.compute_origin(),
+            domain=(grid.nic, 1, grid.npz),
+        )
+
+        self.in_vars["data_vars"] = {
+            "w": {
+                "kend": grid.npz - 1,
+            },
+            "dp2_W": grid.compute_dict(),
+        }
+
+        self.in_vars["parameters"] = ["w_max", "w_min"]
+
+        self.out_vars = {
+            "w": {
+                "kend": grid.npz - 1,
+            },
+        }
+        self._gz = self.quantity_factory._numpy.zeros(
+            (
+                grid.nid,
+                grid.njd,
+            ),
+            dtype=Float,
+        )
+
+        self._w2 = self.quantity_factory._numpy.zeros(
+            (
+                grid.nid,
+                grid.njd,
+                grid.npz,
+            ),
+            dtype=Float,
+        )
+
+        self._compute_performed = self.quantity_factory._numpy.zeros(
+            (
+                grid.nid,
+                grid.njd,
+            ),
+            dtype=bool,
+        )
+
+    def compute_from_storage(self, inputs):
+
+        self.compute_func(
+            inputs["w"],
+            self._w2,
+            inputs["dp2_W"],
+            self._gz,
+            inputs["w_max"],
+            inputs["w_min"],
+            self._compute_performed,
+        )
+
+        return inputs
diff --git a/tests/savepoint/translate/translate_xppm.py b/tests/savepoint/translate/translate_xppm.py
index 059b6120..b85d4992 100644
--- a/tests/savepoint/translate/translate_xppm.py
+++ b/tests/savepoint/translate/translate_xppm.py
@@ -16,12 +16,12 @@ def __init__(
     ):
         super().__init__(grid, namelist, stencil_factory)
         self.in_vars["data_vars"] = {
-            "q": {"serialname": "qx", "jstart": "jfirst"},
-            "c": {"serialname": "cx", "istart": grid.is_},
+            "q": {"serialname": "xppm_q", "jstart": "jfirst"},
+            "c": {"serialname": "xppm_c", "istart": grid.is_},
         }
         self.in_vars["parameters"] = ["iord", "jfirst", "jlast"]
         self.out_vars = {
-            "xflux": {
+            "xppm_flux": {
                 "istart": grid.is_,
                 "iend": grid.ie + 1,
                 "jstart": "jfirst",
@@ -42,7 +42,7 @@ def process_inputs(self, inputs):
 
     def compute(self, inputs):
         self.process_inputs(inputs)
-        inputs["xflux"] = utils.make_storage_from_shape(
+        inputs["xppm_flux"] = utils.make_storage_from_shape(
             inputs["q"].shape, backend=self.stencil_factory.backend
         )
         origin = self.grid.grid_indexing.origin_compute()
@@ -55,7 +55,7 @@ def compute(self, inputs):
             origin=(origin[0], int(inputs["jfirst"]), origin[2]),
             domain=(domain[0], int(inputs["jlast"] - inputs["jfirst"] + 1), domain[2]),
         )
-        self.compute_func(inputs["q"], inputs["c"], inputs["xflux"])
+        self.compute_func(inputs["q"], inputs["c"], inputs["xppm_flux"])
         return self.slice_output(inputs)
 
 
@@ -67,5 +67,5 @@ def __init__(
         stencil_factory: StencilFactory,
     ):
         super().__init__(grid, namelist, stencil_factory)
-        self.in_vars["data_vars"]["q"]["serialname"] = "q"
-        self.out_vars["xflux"]["serialname"] = "xflux_2"
+        self.in_vars["data_vars"]["q"]["serialname"] = "xppm_q2"
+        self.out_vars["xppm_flux"]["serialname"] = "xppm_flux_2"
diff --git a/tests/savepoint/translate/translate_yppm.py b/tests/savepoint/translate/translate_yppm.py
index 394ef9c1..86ecd387 100644
--- a/tests/savepoint/translate/translate_yppm.py
+++ b/tests/savepoint/translate/translate_yppm.py
@@ -35,8 +35,8 @@ def __init__(
     def ivars(self, inputs):
         inputs["ifirst"] += TranslateGrid.fpy_model_index_offset
         inputs["ilast"] += TranslateGrid.fpy_model_index_offset
-        inputs["ifirst"] = self.grid.global_to_local_x(inputs["ifirst"])
-        inputs["ilast"] = self.grid.global_to_local_x(inputs["ilast"])
+        inputs["ifirst"] = self.grid.global_to_local_x(int(inputs["ifirst"]))
+        inputs["ilast"] = self.grid.global_to_local_x(int(inputs["ilast"]))
 
     def process_inputs(self, inputs):
         self.ivars(inputs)
diff --git a/tests/script/geos_fp/TEMP/run_yppm_xppm.sh b/tests/script/geos_fp/TEMP/run_yppm_xppm.sh
new file mode 100755
index 00000000..9795418c
--- /dev/null
+++ b/tests/script/geos_fp/TEMP/run_yppm_xppm.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+THIS_DIR=$PWD
+TEST_DATA_PATH="../../../../test_data/geos/TEMP_XPPM_YPMM"
+mkdir -p $TEST_DATA_PATH
+cd $TEST_DATA_PATH
+
+wget https://portal.nccs.nasa.gov/datashare/astg/smt/geos-fp/translate/11.5.2/x86_GNU/Dycore/TBC_C24_L72_Debug/YPPM-In.nc
+wget https://portal.nccs.nasa.gov/datashare/astg/smt/geos-fp/translate/11.5.2/x86_GNU/Dycore/TBC_C24_L72_Debug/YPPM-Out.nc
+wget https://portal.nccs.nasa.gov/datashare/astg/smt/geos-fp/translate/11.5.2/x86_GNU/Dycore/TBC_C24_L72_Debug/XPPM-In.nc
+wget https://portal.nccs.nasa.gov/datashare/astg/smt/geos-fp/translate/11.5.2/x86_GNU/Dycore/TBC_C24_L72_Debug/XPPM-Out.nc
+wget https://portal.nccs.nasa.gov/datashare/astg/smt/geos-fp/translate/11.5.2/x86_GNU/Dycore/TBC_C24_L72_Debug/input.nml
+wget https://portal.nccs.nasa.gov/datashare/astg/smt/geos-fp/translate/11.5.2/x86_GNU/Dycore/TBC_C24_L72_Debug/Grid-Info.nc
+
+
+cd $THIS_DIR
+rm -r ./.gt_cache_*
+
+export PACE_FLOAT_PRECISION=32
+export PACE_CONSTANTS=GEOS
+export FV3_DACEMODE=Python
+
+python -m pytest -v -s -x \
+    --data_path=$TEST_DATA_PATH \
+    --backend=numpy \
+    --which_modules=XPPM,YPPM \
+    --multimodal_metric \
+    ../../../savepoint